User talk:Inductiveload/Archives/2021

Please do not post any new comments on this page.
This is a discussion archive first created in 2021, although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

The Great Gatsby questions

Latest comment: 3 years ago2 comments2 people in discussion

I am proofreading and validating pages from The Great Gatsby scanned file (the images). Should the punctuation and spelling be altered, based off the scanned images? Windywendi (talk) 00:33, 2 January 2021 (UTC)

@Windywendi: yes, the spelling and punctuation should match the scan. This includes reproducing typographical errors, which you can mark with {{SIC}} if you want. Inductiveload—talk/contribs 00:46, 2 January 2021 (UTC)

vector.css

Latest comment: 3 years ago1 comment1 person in discussion

I downloaded this skin called DarkVector: User:PseudoSkull/vector.css. It's pretty cool and I implemented it to help me not get bad eyesight in 20 years of proofreading every day for Wikisource. It mostly works, but there are a few things it leaves the same, such as "User page" and "Discussion", "Read", "Edit", "History" etc. Also, the search bar is still white.

Any chance you could poke at the CSS for me and possibly fix these minor issues? Please and thank you! PseudoSkull (talk) 07:44, 2 January 2021 (UTC)

Discord

Latest comment: 3 years ago2 comments2 people in discussion

If you verify here (per usual process) that it was indeed you who joined the Discord server just now, I can make you an admin. PseudoSkull (talk) 21:26, 7 January 2021 (UTC)

Yep, that was me. Inductiveload—talk/contribs 10:13, 8 January 2021 (UTC)

Cleanup user css/js

Latest comment: 3 years ago2 comments2 people in discussion

Could you nuke User:Alexlur/common.css to get it off the broken redirects list? Dangling redirect created when renaming a user, that isn't active on enWS. --Xover (talk) 14:54, 8 January 2021 (UTC)

Kaboom,

Done Inductiveload—talk/contribs 14:59, 8 January 2021 (UTC)

Suggest watchlist message re tag marking

Latest comment: 3 years ago6 comments2 people in discussion

Getting lots of hits on your markup filter. Would you please consider doing a Watchlist message to alert people so they are able to modify their editing behaviour, rather than just progress with edits unchanged. Noting that I got a throttle message from the system due to that filter. Thanks. — billinghurst sDrewth 12:40, 10 January 2021 (UTC)

Yes, I will write some proper documentation for this. I have also moved Mediawiki:Tag-deprecated markup and Mediawiki:Tag-deprecated markup-description in response to the changed tag name.

I'm not sure what you mean by a throttle message? There are only a few dozen hits per day on the tag. Inductiveload—talk/contribs 15:05, 10 January 2021 (UTC)

Abusefilter is telling me through Notifications said Abuse filter 41 you recently edited was throttled. so presumably too many edits were being checked and it was sucking resources (noting that I didn't check its stats at the time, probably should have). This is partly why I put some of the new top parts to lessen what it checks, and maybe I should exclude bots too. — billinghurst sDrewth 21:49, 10 January 2021 (UTC)

Weird, I have never seen such a message. When does that message show itself?

Ideally, this wouldn't need a filter at all and we could just use Special:LintErrors, but phab:T173944 isn't done yet (plus at least in RC we can see where such markup is being actively added). Inductiveload—talk/contribs 23:25, 10 January 2021 (UTC)

First time that I have seen it, probably due my being trained (beaten!) WAY BACK to write tighter eliminating components at the beginning of a filter. To note that I have now excluded bots from the filter, and hazarding a guess that it will have been a bot editing somewhere that will have been the trigger; noting though there is no exact time nor greater precision on where the issue occurred. Apologies for not consulting about the term change from "bad" to "deprecated", just could see the confusion starting, and it is my experience that complaints will continue with connotative usage. Whitelist and blacklists will be renamed due to negative connotations of the colour.

May wish to note anecdotal feedback in User talk:廣九直通車#Template:Center about other WSes, let alone other wikis. — billinghurst sDrewth 01:44, 11 January 2021 (UTC)

We should probably look at our introductory help pages like Wikisource:For Wikipedians and the like to talk about html tags being deprecated and to utilise css styles, preferably through existing templates. — billinghurst sDrewth 02:13, 11 January 2021 (UTC)

Philosophical Transactions/Volume 50

Latest comment: 3 years ago4 comments2 people in discussion

Hi. I have noticed that you have been working with the template {{TOC begin}} recently. At the same time the TOC at Philosophical Transactions/Volume 50 got jumbled for some reason. Is it possible that it has been caused by the changes in the template? --Jan Kameníček (talk) 22:22, 10 January 2021 (UTC)

@Jan.Kamenicek: Looks like this is because the last two pages of the TOC don't use table markup, and there was no {{TOC end}} after the table-like portion at the end of Page:Philosophical Transactions - Volume 050, part 1.djvu/17 (well, there was, but it was in the footer). I don't think this would have been related to whitespace between the templates. Inductiveload—talk/contribs 23:19, 10 January 2021 (UTC)

Ah, I see. The TOC was originally designed in a different way, but then somebody started redesigning it using this template, but left it in the half of the way. (I personally prefer the previous version anyway). --Jan Kameníček (talk) 23:22, 10 January 2021 (UTC)

And thanks for finding the problem! --Jan Kameníček (talk) 23:23, 10 January 2021 (UTC)

Ready for export?

Latest comment: 3 years ago4 comments2 people in discussion

Hi. I have noticed that you sometimes check works whether they are ready for export. May I ask you to have a look at R. U. R. (Rossum's Universal Robots)? It is not necessary, only if you have time… Thanks very much. --Jan Kameníček (talk) 11:14, 12 January 2021 (UTC)

@Jan.Kamenicek: Sure! So a quick check produces a few (small) issues:

Put page breaks after images to prevent the following content starting halfway down the page. I know we had trouble before where there were widows before a title, but for an image, this is unlikely, and it's more likely you'll drag half of a title on the previous page. For example: Special:Diff/10831082
- I think the smaller images in Acts I and II are OK as they are, since the size works well with surrounding text.
The {{block center}} should not be specified in px. On an e-reader, the display might be over 1000 pixels across (mine is 1072), but the font size is very large in term of pixels (a visually impaired user may have a screen only 10 or 15em across!), so you restrict things unnecessarily. Better is to specify the layout in terms of em. Browsers usually (unless the user configures it differently, e.g. for accessibility) generally have an em size around 16px, so your 420px here is about 28em. Which looks about the same in the browser, but looks better on the ereader (the e-reader screen is <30em, so it is effectively 100%).

420px

30em

The body of the text renders very well, and the sections all seem to be present. Inductiveload—talk/contribs 12:02, 12 January 2021 (UTC)

Thanks very much, I will try to keep this advice in my mind for other works too. I can see that you have corrected the issues mentioned above, or is there anything left? --Jan Kameníček (talk) 14:29, 12 January 2021 (UTC)

Yes, I think it's pretty much sorted now. I'll let you add the category! :-) Inductiveload—talk/contribs 15:14, 12 January 2021 (UTC)

Right aligned captions...

Latest comment: 3 years ago2 comments2 people in discussion

https://en.wikisource.org/w/index.php?title=Page%3ALetters_from_an_Oregon_Ranch.djvu%2F229&type=revision&diff=10842173&oldid=10153428

This worked, but I'd feel happier with a templated solution moving forward, so that if the HTML/CSS required ever changed I don't have to change a vast number of Page: s . ShakespeareFan00 (talk) 15:43, 21 January 2021 (UTC)

@ShakespeareFan00: OK so there were some nasty left-overs from the previous {{FI}} implementation. Now it should be fine to use divs in there. The problem was it was trying to put a <div> (the {{center}}) in a <p>, and that's not cool. Now the caption is just a div. Not sure what that supposed to be a workaround for, but it looks long fixed (and the markup is way simpler now anyway). Inductiveload—talk/contribs 16:42, 21 January 2021 (UTC)

Reasoning

Latest comment: 3 years ago4 comments2 people in discussion

I'm curious what the reasoning is for this change: [1] --EncycloPetey (talk) 00:24, 22 January 2021 (UTC)

@EncycloPetey: I was reviewing it for export. Firstly, align="center" is obsolete HTML and is currently (due to a bug, as it turns out) exporting incorrectly (but this needs to be fixed "eventually" anyway), and while I was there I changed it over to some templates which apply CSS that was previously omitted. For example, the cells were not correctly vertically aligned for small screens (should be left two columns top, right column bottom), and the last column can wrap (sometimes even in-between the 1 and 3 of 135!) if you don't set white-space:nowrap;.

Furthermore, using constructs such as {{gap}} at the end of a cell line to attempt to force a padding is also misguided, as it will not work when the line wraps (and the longest line always wraps first). The better thing to do is set a size—using text relative units like "em", never "px"—that cap the width at a certain point that gives a suitable gap (in this case, 25em looks "OK" to me, perhaps it may feel it a little wide?) and allow the built-in {{TOC begin}} style to apply a default max-width:100% to prevent overspill on narrow screens, while still allowing to close the gap and not waste space if needed (remembering that a mobile or vision-impaired device may only have 10–15em of width in total).

The dots are just arrant frippery of course, but since I've fixed (most of?) the templates to not export the dots, they aren't causing the exporting havoc that they used to (though the markup is still "not ideal", it's not actively harmful anymore). If you're against them, s/2out/2/g will sort it out for you. Inductiveload—talk/contribs 00:51, 22 January 2021 (UTC)

Thanks for the explanation. I may ask more questions later, since I hope (when I may have time in a few months) to go back through a lot of my older contributions, and Featured Texts, to ensure they are fully formatted for downloads. --EncycloPetey (talk) 19:21, 23 January 2021 (UTC)

I've written some guidance at Help:Preparing for export with some dos, don'ts and known issues. It's not yet complete (notably for TOCs), because I've been trying to address some of the issues at source rather than advising workarounds (for example, dotted TOCs are much less of an issue than they used to be). Please let me know if 1) I break anything somehow, or 2) Help:Preparing for export is too vague on something or 3) you would like an opinion on something (there's also {{export to check}} you can use to ask for a once-over). Inductiveload—talk/contribs 20:38, 23 January 2021 (UTC)

Context is King

Latest comment: 3 years ago1 comment1 person in discussion

T272704's task description needs a few "when exporting" and similar phrases sprinkled into it. The people working on ws-export will infer the context, but everyone else will be left scratching their heads. :) --Xover (talk) 12:53, 22 January 2021 (UTC)

NopInserter

Latest comment: 3 years ago2 comments2 people in discussion

BTW, while I remember: WS:S#Update to NopInserter Gadget. Current version at User:Xover/Gadget-NopInserter.js with some very minor additional changes. There's probably a couple more bells and whistles that could be added if anybody cared, but just fixing the bug is the main thing. --Xover (talk) 12:55, 29 January 2021 (UTC)

Done I made the change, should take effect once the gadget caches cycle. Inductiveload—talk/contribs 13:39, 29 January 2021 (UTC)

Scan-backing

Latest comment: 3 years ago3 comments2 people in discussion

Hi, I found British White Paper of Palestine 1939 which isn't linked to a source. There is File:1939 White Paper cmd 6019.djvu. Is this one you could fix please? DuncanHill (talk) 18:52, 29 January 2021 (UTC)

@DuncanHill: I created Index:1939 White Paper cmd 6019.djvu. The text is actually slightly different (numbers and references). Inductiveload—talk/contribs 20:24, 29 January 2021 (UTC)

Also, see User:Inductiveload/Sandbox/PP, which is something of a work-in-progress. It doesn't reach up to the 1930s, but there may be something there of interest to you. The best resource listed is http://www.cse.psu.edu/~deh25/post/Timeline_files/Papers-House-of-Commons.html, which has tons of Google Books links. You can easily import them from Google to the Internet Archive with https://bub2.toolforge.org/ and from there, let me know and I can more easily upload the volumes to Wikisource (more easily than directly from Google). Inductiveload—talk/contribs 21:29, 29 January 2021 (UTC)

a plainlist question

Latest comment: 3 years ago9 comments2 people in discussion

What is the proper format for the first line of this page?— Ineuw (talk) 22:04, 28 January 2021 (UTC)

I think you can just add <noinclude>*</noinclude> to the first line to fake a new list item. You don't get the "mid" indent, ~~but I suppose it might be possible to fix in CSS with {{plainlist/m}} if it's critical~~ {{plainlist/m}} can be used instead of {{plainlist/s}} to suppress the hanging indent on the first page (it won't make any difference in mainspace). Also note that apparently leaving blank line splits the list up into 'n' lists of a single item each. @PseudoSkull: has been doing something similar recently, BTW. Inductiveload—talk/contribs 22:11, 28 January 2021 (UTC)

{{plainlist/m}} doesn't work, but don't fret it. I usually move the dangling paragraph end to the previous page, but here I am facing with "dangling" paragraphs 2-3 pages long. I don't think that merging them in one page is the right think to do.

I also tried all the other hanging indent templates but use tables mostly because all have this problem. — Ineuw (talk) 03:39, 29 January 2021 (UTC)

It does work, but the first line still needs to be a list item in the Page namespace to receive the CSS. Use <noinclude>*</noinclude> to do that. Inductiveload—talk/contribs 13:38, 29 January 2021 (UTC)

Many thanks. Everything is working well. I checked the results in the main namespace. There is one exception to {{plainlist/m}}. It cannot be used for a paragraph which is joined by {{hws}} & {{hwe}}. This breaks the paragraph in the main namespace, so I use {{plainlist/s}}. Also, I do not enclose the list item (*). It does not affect the main namespace display.— Ineuw (talk) 13:54, 29 January 2021 (UTC)

You shouldn't need {{hws}} and {{hwe}} anymore. Even with it, it seems to work: History_of_Woman_Suffrage/Volume_5/Index#pageindex_804. Inductiveload—talk/contribs 13:57, 29 January 2021 (UTC)

Thanks again. I tested hyphenations as well without the template, and it works. I noticed that when using {{plainlist/m}}, it no longer indents the page namespace but it displays correctly in the main namespace.— Ineuw (talk) 18:42, 31 January 2021 (UTC)

@Ineuw: abracadabra! My bad - I left a sandbox stylesheet in the template so it worked fine...until I stomped in the sandbox. Inductiveload—talk/contribs 19:20, 31 January 2021 (UTC)

Thanks again.— Ineuw (talk) 00:33, 1 February 2021 (UTC)

Template that isn't printready

Latest comment: 3 years ago4 comments2 people in discussion

I will leave Template:Blife-Plate page with you, as I tidy other aspects of the work. It has some lovely images and it would be a waste to not have that one set for viewing. — billinghurst sDrewth 21:31, 1 February 2021 (UTC)

@Billinghust: I'll see what I can do. It'll need a bit of care to strike a balance between nice images in export, filesize and so on. Inductiveload—talk/contribs 09:20, 10 February 2021 (UTC)

And why do you think that I put it in front of someone competent, and not had a go at it myself. I have to go back and look at a whole lot of stuff I have done over so many years. Think that there is some clear POTM stuff that we did in the early-mid 2010s where we looked at works with image and we set them all with 500/600px widths. <shrug> — billinghurst sDrewth 22:33, 10 February 2021 (UTC)

500/600px can be OK, but they might poke out of a Layout 2. There is CSS wrangling with max-width:100%; done to avoid really bad things happening on export (and in the mobile view). Things are very slowly beginning to work by default. :-) Inductiveload—talk/contribs 22:40, 10 February 2021 (UTC)

adding enWS page link to the query

Latest comment: 3 years ago2 comments2 people in discussion

To

SELECT ?item ?label WHERE {
  ?item wdt:P1433 wd:Q19084840.
  MINUS { ?item wdt:P921 [] } .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".
    ?item rdfs:label ?label.
  }
}

Go to query page

could you add something to provide the interwiki enWS link, so we can just click open the page? Thanks. — billinghurst sDrewth 01:48, 10 February 2021 (UTC)

@Billinghurst: Here you go:

SELECT ?item ?label ?article ?page_titleWS WHERE {
  ?item wdt:P1433 wd:Q19084840.
  MINUS { ?item wdt:P921 _:b7. }
  OPTIONAL {
    ?article schema:about ?item;
      schema:isPartOf <https://en.wikisource.org/>;
      schema:name ?page_titleWS.
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".
    ?item rdfs:label ?label.
  }
}

Go to query page

Inductiveload—talk/contribs 09:13, 10 February 2021 (UTC)

Important

Latest comment: 3 years ago2 comments2 people in discussion

!important should be reserved for user stylesheets. Do we really need (need) it in site styles? It makes it effectively impossible for a user to override, both in on-wiki user styles and in UA styles. --Xover (talk) 14:09, 10 February 2021 (UTC)

@Xover: I thought we did, but the same effect can be achieved with being more specific by adding .minerva__tab-container. Inductiveload—talk/contribs 14:41, 10 February 2021 (UTC)

Headers and footnotes

Latest comment: 3 years ago4 comments2 people in discussion

Thanks for the welcome message. I have two early questions. The first is a simple one, on the page headers. The book that I have started transcribing has a header text that alternates format between the odd and even-numbered pages, so

The Voyage of Italy.  Part I.

and

Part I.  The Voyage of Italy.

Can I use the first form throughout, or should I alternate them as the original does?

Second, much more interesting, is on footnotes. I've read the guidance, but I'm not sure if the footnotes are only to duplicate footnote texts that are already in the source text, or whether they can include new explanatory notes by the transcriber. In researching some of the words/ places/ persons mentioned, I would be able to add a footnote to give the current name, or a clarification, or to note the correct spelling to an old form or an original typo. Eg: the text mentions "goistre" (for "goitre") and a "Monsieur Esselin" (actually fr:Louis Hesselin, intendant des plaisirs du roi, 1600-1662). By adding the correct/ modern text in the footnote, at least makes the text searchable. Is this something that can be done? Scarabocchio (talk) 15:05, 11 February 2021 (UTC)

@Scarabocchio: re the running headers, we do put them the right way around. There is a gadget to help you: Help:Gadget-RunningHeader, or there are template that auto-flip the side based on the page number. In either case, you need to take care of the section name change. @PseudoSkull: has a bot for this, perhaps they can help (then you can just leave them out and bot them in later - honestly, doing it manually is a bit of a waste of human life IMO).

Re the footnotes: these are considered WS:Annotations, and generally speaking, we don't put them in. However, it is allowed to link to authors and works at Wikisource, so the names are easy, and a very small number of links to Wiktionary for really odd words (like "goistre") are also fine, but often the word isn't at Wiktionary. Again, PseudoSkull can help you there, he's a Wikitionary person too. We do have {{User annotation}} which you use to mark your own, more detailed, annotations in footnotes. I don't find the concept objectionable at all (it's decent value-add to me), but it would be worth asking for clarification on the WS:Scriptorium, since that's probably not a universal opinion. Inductiveload—talk/contribs 15:29, 11 February 2021 (UTC)

@InductiveLoad: Many thanks. On the running header, the text requires a three part header: <pageno>+"Voyage of Italy"+"Part I", or the alternate "Part I"+"Voyage of Italy"+<pageno>. The examples at Help:Gadget-RunningHeader show it working with a two-part header. Can it work with three parts?

The guidance in WS:Annotations implies that an original, unannotated version should exist before any annotations are added. I'll carry on transcribing just the base text for a while, keeping my own annotations separate, to confirm that I am going to carry on working here. Scarabocchio (talk) 16:22, 11 February 2021 (UTC)

@Scarabocchio: it should work with three parts as well (let me know if it doesn't, that's a bug). Inductiveload—talk/contribs 18:55, 11 February 2021 (UTC)

DIV span swaps...

Latest comment: 3 years ago1 comment1 person in discussion

https://en.wikisource.org/wiki/Special:LintErrors/misc-tidy-replacement-issues?namespace=104

I got this down to under a page, but the remaining reported swaps require someone with more expertise to resolve.

Once these are resolved, quite a few of the remaining "DIV span swaps" in main-space are down to the REF wrapping issue which is a known issue.

I am also reverting some of my changes in respect of Index:Beowulf_(Wyatt).djvu made previously to allow someone with more expertise to come up with a stable long term solution, because the current {{lang block}} doesn't format consistently for this work. ShakespeareFan00 (talk) 10:21, 12 February 2021 (UTC)

Template:Letter-spacing

Latest comment: 3 years ago7 comments2 people in discussion

Thanks..

You might also need to check for the situation of {{lsp||Text to use default spaci|ng}} in the template logic.

See User:ShakespeareFan00/foo/testcases and the template code called, for potential ideas on how to handle this very robustly.

Generally the required output for cases of @@absent@@ and @@empty@@ will be the same in many instances, though. ShakespeareFan00 (talk) 22:14, 13 February 2021 (UTC)

@ShakespeareFan00: Does it not work?

Example

* {{lsp||Text to use default spaci|ng}}
* {{lsp|1=|2=Text to use default spaci|3=ng}}
* {{lsp|0.15em|Text to use default spaci|ng}}

Text to use default spacing
Text to use default spacing
Text to use default spacing

Looks the same to me. Parameter 1 doesn't care about param 3 being present or not? Smart logic to allow {{lsp|Text to use default spaci|ng}} and notice that parameter 1 is not a valid CSS size is technically possible, but probably more confusing that just allowing a blank parameter.

My plan is to work to remove the custom spacing from {{sp}} and short-cut that to be {{lsp||{{{1}}}|{{{2|}}}}}, so it can be used for the default case and get the "un-spaced tail" capability. Inductiveload—talk/contribs 22:21, 13 February 2021 (UTC)

For me test case1 in your examples looked slightly different, I am thinking that may be a cache issue. ShakespeareFan00 (talk) 22:31, 13 February 2021 (UTC)

Aside, There's not an easy mechanism to trap for px vs em based values is there? I seem to recall you stating elsewhere that px values in templates were deprecated in favour of em based ones which scale better for mobile devices. ShakespeareFan00 (talk) 22:31, 13 February 2021 (UTC)

@ShakespeareFan00: There's {{CSS unit}} which you can use with a #switch or #ifeq to catch naughty px values. {{AuxTOC}} currently catches them and adds Category:Pages_using_pixel_widths. 287 sinners right now.

Do you remember the name of the enWP tool that can search through all template parameters? Inductiveload—talk/contribs 22:35, 13 February 2021 (UTC)

I don't. Sorry. ShakespeareFan00 (talk) 23:04, 13 February 2021 (UTC)

In case you wanted to focus your bot on something specific - https://petscan.wmflabs.org/?psid=18438860 was all the Page: namespace instances. As many of these will be transcluded, clearing these first should reduce the main list a bit? ShakespeareFan00 (talk) 10:19, 14 February 2021 (UTC)

Right-aligned header

Latest comment: 3 years ago4 comments2 people in discussion

Hi. Is it necessary to right-align the header? Imo it does not look nice. --Jan Kameníček (talk) 15:16, 17 February 2021 (UTC)

@Jan.Kamenicek: It was already right-aligned in the Site CSS (look for .gen_header_forelink). The only place it was not right aligned was on mobile (which didn't matter much because previously it was crushed into 20% width, which is very narrow on a mobile screen). Inductiveload—talk/contribs 15:34, 17 February 2021 (UTC)

Maybe it is cause by something different, but up to now the headers of author pages were centered, while today the names of authors and their dates appear much more to the right, see e. g. František Lützow or any other author page. --Jan Kameníček (talk) 16:01, 17 February 2021 (UTC)

@Jan.Kamenicek: oh, I see. That actually was a different thing, not the alignment. I've gone back to hardcoding the widths on wide screens to 20:60:20, and only do "free width" on small screens after wrapping. Thanks for the heads up :-) Inductiveload—talk/contribs 16:11, 17 February 2021 (UTC)

Footer (slightly) borked

Latest comment: 3 years ago8 comments2 people in discussion

I'm seeing the footer missing the back/forward arrows, and its sizing is a hair off (too wide on the right). It probably needs adjusting after your changes to the markup of the {{header}}. Doesn't look like it's broken enough that most people will notice, so you can probably just stash it on the todo until your header modifications are stable.

PS. I'm giddy to see the header getting cleaned up! --Xover (talk) 17:08, 17 February 2021 (UTC)

@Xover: I've added the arrows back in, they're no longer in the #headerprevious/next element so they weren't being picked up.

I don't think the width has been affected by what I did. The header uses a fixed 20:40:20 width ratio (it always did, the new changes should only kick in on small screens). The footer uses a float method, much like {{rh}}, which means that the central "cell" might well not be centred if the two sides aren't the same width.

I might try a flexbox approach for the footer too at some point.

Glad you like it so far. Module:Header is slowly accreting functions and I don't think it's exploding anywhere. Inductiveload—talk/contribs 17:24, 17 February 2021 (UTC)

Given the arcane mess it was, the changes have been remarkably unexplody, yes. Very nicely done! --Xover (talk) 17:27, 17 February 2021 (UTC)

Hmm. Looking closer I see the various horizontal boxes all have "ragged" right edges. Probably worth looking into. Eventually. --Xover (talk) 18:41, 17 February 2021 (UTC)

@Xover: Which ones? Is it a new thing? Inductiveload—talk/contribs 18:44, 17 February 2021 (UTC)

Not sure. Never noticed before, but I'm inclined to think that's just because I haven't really looked that close until the recent changes. I may be being fooled by the sister project links in the notes field (I'll have to throw up some grid lines to be sure). The footer is off by half an em or something relative to the categories box. And eyeballing it on one page the header looked like it was also off by a similar amount, but now you ask I'm no longer sure. --Xover (talk) 19:11, 17 February 2021 (UTC)

Hmm, I have noticed that before, but never investigated. Looks like a stray 100% width in Mediawiki:Gadget-Site.css for .footertemplate. I think this'll fix it.

The header looks in-line to me, but the plain sister is inset a bit within the notes field (because the plain sister has 0.5ex of margin on all sides). The shortcuts box doesn't, so I think just removing the right-side margin will line things up better. Inductiveload—talk/contribs 19:21, 17 February 2021 (UTC)

Yeah, that did the trick. --Xover (talk) 20:31, 17 February 2021 (UTC)

Missing controls in index_preview

Latest comment: 3 years ago7 comments2 people in discussion

Hi. Added mw.loader.load('//en.wikisource.org/w/index.php?title=User:Inductiveload/index_preview.js&action=raw&ctype=text/javascript');. I see the thumbnail of a single page but the said controls do not show anywhere in the frame.— Ineuw (talk) 02:40, 19 February 2021 (UTC)

This is the display. What could I be doing wrong? — Ineuw (talk) 05:38, 19 February 2021 (UTC)

@Ineuw: For the moment, the controls only appear when the page does not already exist. Adding them requires some more UI logic to ask the user if they're sure and if they want to replace/append to existing content. Inductiveload—talk/contribs 09:44, 19 February 2021 (UTC)

Got it, Thanks.— Ineuw (talk) 16:23, 19 February 2021 (UTC)

I hope you don't mind my testing the script and uploading screenshots. I clicked twice on the same page, with a different control. Could not escape from my error and the tab froze, but not the browser (Firefox). — Ineuw (talk) 17:02, 19 February 2021 (UTC)

Weird, that works for for me. I'll keep an eye out. Inductiveload—talk/contribs 17:41, 19 February 2021 (UTC)

Perhaps there is something in my User:Ineuw/vector.js that interferes with the script? — Ineuw (talk) 19:23, 19 February 2021 (UTC)

Template:Header/main block‎

Latest comment: 3 years ago2 comments2 people in discussion

I see you were re-working the header templates..

I have a request..

It relates to - Folk-Lore/Volume 2/Legends of the Lincolnshire Cars, Part 1 header.

Here the section tag is used for the Issue and for the Article name. This works, but leads to a long section name if the line-feed is removed to resolve a lint error.

What would be better is to have an Issue, article title separation, but that might need a new {{periodical contribution header}} to pre-process? ShakespeareFan00 (talk) 12:10, 19 February 2021 (UTC)

Hold on for the moment and we'll come back to it when the logic is all in the module, otherwise it's just hacks on top of hacks. Inductiveload—talk/contribs 12:12, 19 February 2021 (UTC)

Kitchen sinks…

Latest comment: 3 years ago3 comments2 people in discussion

cf. this comment. If we have to nerf taint checking in {{header}} because {{versions}} does something funky, my immediate thought is that we need to split the code further to keep the funkyness from complicating the code for {{header}}. Does it look like there's potential for separating presentation from parameter handling from metadata from structure, or some other sensible ontology? I mean, even if we keep all the special-casing and such it'll be better by the mere fact of being in Lua, but it'd be nice excise these warts when we have the opportunity. :)

PS. Module:Arguments. --Xover (talk) 13:44, 19 February 2021 (UTC)

@Xover: my thinking is to port {{header}}'s logic to Lua essentially as-is first (maybe with minor tweaks), then worry about tidying up once I have it as code that can be reasoned about.

Since {{versions}} calls {{header}} and manually inserts formatting like {{Font-weight-normal|Versions of}}<br />''{{{title|{{PAGENAME}}}}}'' into the header field, there's not much we can do right now other than just exclude them. Once the header module is looking tidier, we might consider another parameter to allow {{versions}} to do this in a saner way. We might consider that {{versions}} shouldn't call {{header}} itself, but everything calls {{header/base}}, a separate invocation of Module:Header or something else. Exactly how it shakes out is a task after moving the logic from the "title cell" of the header, which is my primary goal right now (so we can fix junk like double spacing and missing commas).

Re separation of structure, that's what {{header/main block}} is aiming for (rather than cramming it all into Module:Header and so it can be harmonised with other headers).

Module:Arguments looks handy, I'll check it out.

And thanks for fixing that talk page, no idea what happened there :-/ Inductiveload—talk/contribs 14:05, 19 February 2021 (UTC)

Yeah, getting into a state that is amenable to… well, anything, really… is the critical thing (you have no idea how grateful I am that I don't have to touch that mess!). The comment just tripped a red flag for me so I wanted to give you a poke while you have your hands deep in its guts.

enWP have developed quite a bit of Module infrastructure that makes Lua a lot less primitive to work with. Arguments, Yesno, No globals, and utilities for working with categories, etc. And a lot of the modules (unlike the templates) are not too tied to enwp-specific logic. They're not really built to be easy to fork and keep in sync, but a surprisingly large portion can be used as-is. --Xover (talk) 14:20, 19 February 2021 (UTC)

Making hidden layouts visible...

Latest comment: 3 years ago4 comments2 people in discussion

Check out what I added to my common.css, That with your highlihter styles, is starting to look powerful..

What would be nice is a way to add some javascript to 'toggle' this on or off.

Known issues -

Currently I have no easy mechanism for limiting it to 'Content' pages, Talk pages look very weird when viewed with the current ruleset.
I should really set a rule for DL usage on talk page, as that usage is so widespread as to be de-facto usage.
No detection for tables currently.
Symbol set used probably needs to look more like something LibreOffice uses to show non-printing characters.

etc..

ShakespeareFan00 (talk) 17:55, 21 February 2021 (UTC)

@ShakespeareFan00:

JS toggling is easy, just add/remove a class to mw-parser-output and target that with CSS. You can even have multiple toggles.
Content pages are easy, because they have even namespace numbers. Load the CSS from the JS to be able to do if (ns % 2 === 0) { loadCSS(); }

Inductiveload—talk/contribs 18:33, 21 February 2021 (UTC)

~~I have not written JS code before. Do you have examples?~~ ShakespeareFan00 (talk) 19:54, 21 February 2021 (UTC)

ON second thoughts, I am probably too inexperienced to even understand them at present. ShakespeareFan00 (talk) 20:03, 21 February 2021 (UTC)

Septuagint (Brenton 1879)

Latest comment: 3 years ago3 comments2 people in discussion

For the project Septuagint (Brenton 1879), is it OK to leave it in its present form, which is a little easier to navigate, and to import data from elsewhere?

Doing things from the original book means the Bible data is organized by page and it requires the data to be re-transcribed. Bobdole2021 (talk) 20:09, 24 February 2021 (UTC)

@Bobdole2021: it really should be transcribed against the original so it can be proofread side by side with the book. However, "continuation" table rows is a bit of an issue that I haven't thought of a tidy solution for, and you will need that. There are work-arounds, but they are rather messy.

In general, we are trying to reduce the amount of non-scan backed works wherever possible. They don't need to be retranscribed, just have the existing content moved to the Page namespace.

It would still be organised and presented in the mainspace as it is now. Inductiveload—talk/contribs 21:43, 24 February 2021 (UTC)

OK that's fine. I'll both setup things up in the mainspace from existing data, and work on figuring out how to put it into page-by-page setup so I can double-check with the original. Sounds good. Bobdole2021 (talk) 22:43, 24 February 2021 (UTC)

Quantum categories

Latest comment: 3 years ago5 comments2 people in discussion

Just now, while the file is deleted at Commons (undel request opened), Index:Login USENIX Newsletter feb1983.djvu shows that it is in Category:Pages with missing files. Naturally enough. However, looking at the category page the index doesn't actually show up there. I'm flummoxed. You got any ideas?

The only things I can think of are either PRP is doing something non-standard, or MW has a configuration that excludes certain namespaces from showing up in a category. But neither one seems obviously plausible in light of the fact the the category does show on the Index: page. --Xover (talk) 10:38, 24 February 2021 (UTC)

@Xover: whoops forgot to save this. Looks like it was a caching issue and a hard purge did the trick. Inductiveload—talk/contribs 19:30, 25 February 2021 (UTC)

No, what you're seeing is presumably because the file was undeleted at Commons. I purged both the Index: and the Category: and still the category was listed on the Index: page but the page was not listed in the Category:. I guess I'll have to set up some test pages and see if it's reproducible. --Xover (talk) 21:49, 25 February 2021 (UTC)

I did a hard purge (with the purge gadget) yesterday and it seemed to work. I was on my phone and forgot to finish the message to tell you because I got distracted by real life annoyingly happening. I often see categories not filling, generally it resolves in the end, but I've never timed it. Inductiveload—talk/contribs 22:33, 25 February 2021 (UTC)

If one set of purges didn't loosen it I kinda doubt the next ones did. But some indirect categories (transclusion, MW set cats, etc.) are updated on a periodic basis, either literally or figuratively by a cron job, so that is certainly one possibility. I'm just not quite buying it since the file was deleted last November and it still hadn't updated yesterday. It could be purge + time, of course, rather than just time, but that's not an effect I've noticed in MW before. Incidentally, the file was restored just over two hours after I posted here so that's a pretty narrow window. --Xover (talk) 23:32, 25 February 2021 (UTC)

charles albert buck is full of mistakes/lubek's third chess castle

Latest comment: 3 years ago4 comments1 person in discussion

The following discussion is closed:

End of discussion. Have an nice life, but do it elsewhere. :-)

dude, you are reverting correct info, year of birth is incorrect!

also he foretold the third chess rotation: correct year of birth 1868; third chess castle:

it.wikipedia.org/w/index.php?title=Discussione:GNU_Chess&diff=118899384&oldid=55444179

so, instead of vandalizing, where to post that info, on what chess talk page if not on his? —unsigned comment by 124.171.129.129 (talk) .

@124.171.129.129: None of this is evidence of his date of birth. This is also not a place for original research, or promotion of a chess move. If you have an original, published, public domain document related to the "third chess rotation" (whatever that means), feel free to present it. Otherwise, there is no place at Wikisource for this material. If you continue to spam this stuff, you will continue to be blocked. Inductiveload—talk/contribs 17:49, 25 February 2021 (UTC)

go on findagrave for proof,i gave it and was deleted SO IM NOT GIVING IT AGAIN, type his name and you will see im correct, give me link where i can present third chess castle! —unsigned comment by 175.34.229.70 (talk) .

That's a different Charles A. Buck. https://www.findagrave.com/memorial/20259001/charles-a-buck is ours. If you think it's the wrong one, you can explain why you think so. According to https://www.scribd.com/document/337601613/Spinrad-Charles-Buck, it is the right one. If you information about "third chess castle" is not a public domain document, we can't have it here. If you write it up properly, perhaps Wikibooks, but you'll have to do better than the current spam. Sorry. Inductiveload—talk/contribs 18:09, 25 February 2021 (UTC)

No, wikibooks does not care and lubek's third chess castle is played in some countries; also be careful of unsavory characters; they are reported on: wikipediasux /forum/viewtopic.php?f=10&t=1333&p=19413&sid=d276c4732d977b481e173f1a7b4258c6#p19413 (this circus is already live across www and you or anybody else across wmf wont be able to alter it) who will try to remove our conversation, if it does, dont allow others to play with your page, that will show your high level of low self-esteem: i will create a user account here and under my space i will write about the castle. What happened to public domain, it used to be up to january 1 1923, now it's 1926? MY STUFF IS NOT SPAM, BUT HIGHLY EDUCATIONAL MATERIAL AND THEN SOME... —unsigned comment by 118.210.49.156 (talk) .

Public domain in the US is set at 95 years ago. Thus it moves forward one year, every year. 2021 - 95 = 1926.

Please do not add your "educational material" here. This is not a chess strategy forum. If you contributions are not good faith efforts at transcribing public domain texts, then, given your history, they will continue to be blocked on sight. You had your chance for the benefit of the doubt, and the fact you have used three IPs for this conversation hardly fills me with confidence. Inductiveload—talk/contribs 18:53, 25 February 2021 (UTC)

no, you never gave me nor do you give anybody benefit of the doubt like rest of wikipedoians, my castle statement is here anyway from long ago you wont find it and i dont need to post it again and it is across other wikis and it goes to show how ignorant and stupid you are when it comes to ip: there are dynamic and static IPs, DUH!!! can you post full story to: www.scribd.com/document/337601613/Spinrad-Charles-Buck; also there is plenty of evidence his book was published on january 1: yet your pals erased that date, thus making wikisource articles inacurrate again, again and again: en.wikisource.org/w/index.php?title=Paul_Morphy:_His_Later_Life&diff=10604774&oldid=10414686

If I'm so ignorant and stupid that I think dynamic IPs moving from Sydney to Canberra to Perth randomly is suspicious, then I must be far too stupid help you. Sorry. Good luck bringing your chess move to the world, but it's not going to happen at Wikisource. Inductiveload—talk/contribs 19:22, 25 February 2021 (UTC)

and my ips are australian and they change and they could be out of australia as professionaly defined and you just sait it right how stupid you are and all wikipedoians: whatismyipaddress.com/dynamic-static

Source tab missing in document

Latest comment: 3 years ago11 comments2 people in discussion

Hi Inductiveload, I've noticed that the source tab and the side page links that point to the source djvu pages is missing on the chapter pages for translation I've been working with (and which you so graciously helped me to get started with): Translation:Writings of Novalis The problem is with all the chapters, so I figure its in the Index Page Code or TOC. I'm wondering if I may have inadvertently disrupted the code in some way. Could you guide me on how to address this? Thanks you for your help! Wtfiv (talk) 06:22, 10 February 2021 (UTC)

@Wtfiv: Nice work! There is a bug in the ProofreadPage system that only shows the source tab in the main namespace: phab:T203102. It's been open since 2013 (yes, really), so I'll try to arrange a local stop-gap. Inductiveload—talk/contribs 09:17, 10 February 2021 (UTC)

@Wtfiv: JS sticking plaster in place. Let me know if you still can't see it. Inductiveload—talk/contribs 16:46, 10 February 2021 (UTC)

Thanks Inductiveload! The JS sticking plaster works! It gets the reader to the index page, which is invaluable for verifying the source. When researching the problem I saw the 2013 discussion and actually looked at the example it provided, and saw it was broken. I had some memory at some point in the process of having the access to the source tab, and your note agrees, 2013 was a long time ago! So I figured it was fixed. It particularly helps readers with questions get to the original/translated pages. Thanks for taking care of that!

I have one more tentative request, realizing this may be a bigger issue than sticking plaster can handle: Would it be able to get back the page links on the left that point to the individual djvu page? If not, I appreciate what you already have done! Wtfiv (talk) 03:14, 11 February 2021 (UTC)

@Inductiveload: I saw that you have indeed been able to apply more plaster and bring the pages back to translations as well! Thank you...(I wonder how much work that was)? I may be sounding like the Fisherman's Wife here (or maybe you are still thinking it through), but would it be possible to shift the text to the right a couple of em's so that the page numeration doesn't overlay on the text? Regardless, I am still grateful that you have been able to address a problem that has been hanging out there since 2013! And I am also appreciative of how responsive you've been regarding this issue. Also, I suspect I'm not the only one to feel this way, but it's nice to know you are there to support and guide us through the esoteria of this scriptorium. I like its solitude, but it's good to know there is someone out there to make sure the arcane works remain accessible! Wtfiv (talk) 19:45, 13 February 2021 (UTC)

Glad you like it here. I'm trying to kick things into shape a bit with the JS, I'm glad it's working. Getting the indent might or might not be easy, I'll give it a poke, but it might not be an overnight fix. Or maybe it will be!

Esoterica is certainly one way to put it. I'm trying to whip some docs into shape, but there's a long way to go yet!

Remember, I'll always happy to give a hand if I'm around, feel free to ask. And pointing out places documentation doesn't make sense is a great help to streamlining the on-boarding process, so feel free to drop notes if anything doesn't make sense, or it if doe, but it's unclear at first, or can't be found easily. Inductiveload—talk/contribs 20:05, 13 February 2021 (UTC)

@Inductiveload: I just wanted to say that your adding the page/djvu links has already been helpful in terms of my work with the translation. More than once, I start seeing a systematic pattern that requires me to go back and change a recurring word's translation though the documentation, and the page number really helps me to get to it quickly rather than having to click through each page in the index. And, in the unlikely chance somebody wants to verify a translation, the page makes that available to them as well. So for me at least, your repair of most of the 2013 damage to the translation pages is really appreciated!

Also, I can definitely see the work in the help pages that have been done. If you've been doing a lot of the contributions to it, I again thank you! Because of its more complex nature, learning to navigate and edit Wikisource is definitely a steeper learning curve than some of the other Wikis. At some point, I may switch gears a bit and reflect on the kind of help that would have made things easier. (Though I think your friendly encouraging response to my initial query, the creation of the initial dvju's and providing some models to guide me by setting the up the TOC, and index pages was probably the greatest help.) Thinking about your offer on the help pages, I think an entire scriptorium discussion on how to decrease the slope of the initial learning curve may be useful though it'd be useful to get members at the relative beginner and intermediate states involved, as their memories of navigating are still recent.

One final observation while I've been working here. As I delve deeper into Wikisource, however, I'm starting to realize the service it can provide to the larger community is not as great as it could be. Its already a great repository for classic and historically relevant proofread texts, and it is one of the best tools on the web for accessing these texts in various formats that actually work properly (e.g., mobi). But on the whole, much of what is seems to offer still seems hidden from view. Outside of the Wikisource universe, the works seem hard to find, and there is not nearly as many links to these works in Wikipedia as there seems there should be. But that too may be something for a scriptorium conversation another day. As you can already tell, I am definitely enjoying the place. Wtfiv (talk) 19:06, 15 February 2021 (UTC)

I am really glad you're enjoying yourself!

One of the major, major issues we have is organisation and discoverability, since "bit pile o' pages" is our default state. Portals are always suggested, but they're rarely set up into a really nice state and they often "peter out" after a couple of levels. If you have a special interest, working on a portal can be a nice thing (any tying it up to authors, works, scouting for works to add, etc). For example, Portal:German literature is a fairly bland and disordered list, and a more "exhibition style" page with a curated section for the Big Authors, a section for "The Classics (TM)", by subject, era, school (Romantic, Reformation,....) etc., would probably be more engaging. And it's allowed to have some description in the Portals, it doesn't have to be just lists. For example, I tried to add some background to Portal:History of China (and then got sidetracked).

Work is slowly happening on categories, but IMO manually curating things will always be needed at least at the higher levels. Inductiveload—talk/contribs 19:20, 15 February 2021 (UTC)

Exploring how to work out the portals may be a great way to go! I took a look at yours and it does give me ideas. Thanks! Wtfiv (talk) 04:59, 17 February 2021 (UTC)

@Wtfiv: btw, I think I just fixed the issue with the overlapping page numbers. As far as I know (which is not very far), Translation space and mainspace should work the same with transclusions now. Inductiveload—talk/contribs 15:01, 1 March 2021 (UTC)

It looks great! And, it is downright inspiring- time to do some more work! Not only are the pages functional for editing and verification as per your last fix, but now it looks really nice for public viewing too. Thank you so much! Wtfiv (talk) 16:16, 1 March 2021 (UTC)

Template:Dialogue_indented

Latest comment: 3 years ago1 comment1 person in discussion

Mostly unused, I was thinking that something like this should be Module, which would potentially have no upper limit on the pairs that could be added.

I also made some changes recently to {{rbstagedir}}, mostyl converting it to a classed SPAN , instead of a call to {{float-right}}

I've also got {{poem special/s}} working , as well as {{stagescript/s}} and family.

I also note {{playscript}} which is table based. (which isn't necesarily ideal in a paged environment).

None of these are widely used outside of a few specfic works, and thus I am wondering if the efforts made already should be combined into a single Module, based on what you had already achieved with ppoem, to work around some defects in the POEM extension.

ShakespeareFan00 (talk) 11:02, 28 February 2021 (UTC) (Aside: Generally I've noted that in print poetry, (and this may be a consideration for export) that printed poetry ( and hymnals) tend to move the end of a stanza that won't fully fit on an output page to the next page. I'm not sure how this could be done in CSS though.)

Alignment of translation with original

Latest comment: 3 years ago23 comments2 people in discussion

Hello Inductiveload,

This question pertains to the text: https://en.wikisource.org/wiki/Page:Peregrinaggio_di_tre_giovani_figliuoli_del_re_di_Serendippo.djvu/1

I've completed the transclusion from the original Italian text to English by populating the left-hand pages. I've started the proofreading process and want to ensure to tidy up any loose ends, e.g. tweak the translation if needed, correct the English, etc.

One bothersome aspect is to align the English translation as closely as possible with the Italian original. At the end of the page there is often the issue that the sentence structure of English and Italian may not align and so, there needs to be a decision what to include on the current page and what on the next page. Also, as in the Italian original, the last sentence on a page is often not finished and continues on the next page. Question, is it problematic for the completion of the translation product to its final phase if the sentences are split between the bottom of a page and the top of the next page? An example are pages 14 and 15, but the situation occurs on many consecutive pages.MvRwiki1944 (talk) 19:22, 25 January 2021 (UTC)

Your guidance will be appreciated.

@MvRwiki1944: good work! That's OK, it's just how translations are - you can't ensure the sentences split neatly across pages every time. As long as it's vaguely sensible, just choose whatever is easier (probably put the entire sentence/clause/etc on the page where most of it is in the original. Inductiveload—talk/contribs 20:05, 25 January 2021 (UTC)

@Inductiveload:Proofreading completed. What to do next? Wait for the text to be validated?MvRwiki1944 (talk) 08:19, 11 February 2021 (UTC)

@MvRwiki1944: Wow, wonderful! The next step is to transclude it to mainspace and add it to {{new text}}s to strut its stuff. Validation will eventually happen when a suitably motivated Italian-English bilinguist comes along: it is no impediment to presenting the work in the mainspace. Inductiveload—talk/contribs 07:26, 18 February 2021 (UTC)

@Inductiveload: In the transclusion to mainspace do we just manipulate the English translation or do we also include the Italian original text for the benefit of the validators? The latter in the form of the manuscript or a transclusion to modern orthography? And where do I find the workspace to start the transclusion process?MvRwiki1944 (talk) 20:48, 27 February 2021 (UTC)

@MvRwiki1944: We don't have to include the Italian. What we can do is to link both the enWS and itWS mainspace page from the same Wikidata item (d:Q3563423). Then a link to itWS will appear in the mainspace front page for enWS, and for enWS at itWS.

In theory, we can do parallel it/en text using inter-wiki transclusion, but normally you'd only do that after there's a "clean" English translation. In particular, parallel texts do not (currently) export well at all.

I suggest that you transclude at The Three Princes of Serendip (the title of the English Wikipedia article). You can mention the Italian title in the notes field of that page. Inductiveload—talk/contribs 19:12, 28 February 2021 (UTC)

@Inductiveload: So, I've started the transclusiom with the Title page and the Contents (not in the original manuscript). How, do I move on to a new section, presumably with a fresh create without losing the first 2 pages? The next section is Imprimatur (pages 3-6).MvRwiki1944 (talk) 23:29, 28 February 2021 (UTC)

@MvRwiki1944: Whoops! Forgot this should have been in the Translation space! I moved it for you. I also created Translation:The Three Princes of Serendip/Imprimatur as a demo for transcluding the next set of pages.

Content that does not appear in the original generally does not go in the Page namespace, even if there is a handy blank page to put it in. Rather, we put it in the main (or translation!) space and use something like {{AuxTOC}} to mark it as "added value". Inductiveload—talk/contribs 23:40, 28 February 2021 (UTC)

@Inductiveload: Thanks for your assistance. Now, how do I move back and forth between various subsections? I can get to each them based on previous notification, but I don't see how I can move back to the title page and Contens from Imprimatur.MvRwiki1944 (talk) 00:05, 1 March 2021 (UTC)

@MvRwiki1944: generally, there's a link to the next and previous sections in the relevant header fields and a link to the title page in the title field on each page. All three should use relative linking. Inductiveload—talk/contribs 00:11, 1 March 2021 (UTC)

@Inductiveload: Thanks for your help. Finished the transclusion of the document to mainspace, moving from one section to the next, starting with your upload of the Imprimatur. To navigate backwards in the document via links doesn't seem to work since the backwards links aren't there. The document in mainspace needs major cleanup to move on to the next stage. In order to transclude by section I moved material from adjacent pages in the proofread source document, so as not to have to deal with splitting pages. Any recommendations for cleanup are appreciated.MvRwiki1944 (talk) 02:06, 1 March 2021 (UTC)

@MvRwiki1944: I've moved them to our standard naming of Rank N, but I left the display names in the header as you had them.

You should also not need to move text between pages in the normal case. You can do it with Labelled Section Transclusion. Inductiveload—talk/contribs 08:16, 1 March 2021 (UTC)

@Inductiveload: I had moved all text to mainspace, but now all text after Prologue is gone in mainspace. Also, I still don't see functional backward links. How do I get back to the full text without having to recreate it?MvRwiki1944 (talk) 12:02, 1 March 2021 (UTC)

@MvRwiki1944: Look at the table of contents at Translation:The Three Princes of Serendip. It's all there. The backlinks should be formatted like [[../Foreword/]], otherwise they will not be links. Inductiveload—talk/contribs 12:07, 1 March 2021 (UTC)

@Inductiveload: I can see the backlinks now. Next task is to clean up the text in mainspace, starting with the running-together of letters in the paragraphs, e.g. in Imprimatur the words 'consequence' and 'suffice'. Can this be edited out in mainspace, or do I need to go back to namespace?MvRwiki1944 (talk) 12:29, 1 March 2021 (UTC)

@MvRwiki1944: What do you mean by "running together"? The words look OK to me. The page numbers do overlap the text, but I'm currently working on that. Can you take a screenshot? Inductiveload—talk/contribs 12:32, 1 March 2021 (UTC)

@Inductiveload Sent screenshot of Imprimatur, showing the words 'consequence' and 'suffice'. This running together of letters occurs all over the text. Sent email with PDF attachment to wiki@wikimedia.org, in reply to your most recent email to me.MvRwiki1944 (talk) 12:54, 1 March 2021 (UTC)

@MvRwiki1944: I don't think you can reply to email "pings". Maybe just upload the screenshot to imgur.com (or wherever) and post the link. Just the ID part of the URL will do if it sets off the spam filter. Inductiveload—talk/contribs 13:07, 1 March 2021 (UTC)

@Inductiveload: Not familiar with imgur.com (or whatever) and link to what? The PDF is a screenshot of Imprimatur as it appears on my computer.MvRwiki1944 (talk) 13:18, 1 March 2021 (UTC)

@MvRwiki1944: It's a website for posting images. I don't receive emails wiki@wikimedia.org. Or you can email me <removed>. Inductiveload—talk/contribs 13:39, 1 March 2021 (UTC)

@MvRwiki1944: so you did mean a collision with the page numbers. Can you try it now? Inductiveload—talk/contribs 13:55, 1 March 2021 (UTC)

@Inductiveload: Looks great now. Problem solved. Is the transclusion document equivalent to mainspace and is it the document from which the validators work by checking the namespace document they can access through clicking on the numbers in the left margin of the page. Has the text been reclassified as to be validated ? nd what should I do next? Also, what is the shortest pathway to the document on mainpage?(talk) 01:06, 2 March 2021 (UTC)MvRwiki1944 (talk) 17:03, 2 March 2021 (UTC)

@MvRwiki1944: It is now live on the front page. I updated the index page status to "to be validated" and created Author:Cristoforo Armeno. Inductiveload—talk/contribs 17:34, 2 March 2021 (UTC)

Continued listitems...

Latest comment: 3 years ago1 comment1 person in discussion

Page:Russell_Bucklew_v._Anne_L._Precythe,_Director,_Missouri_Department_of_Corrections.pdf/42 has a continued list item.

I was currently using this:- Template:Plainlist/single.css via a template style to suppress the marker on a continued list item.. However, a very helpful user on the Wikimedia discord's Technical channel expressed concerns that this might not be an ideal way of doing this.

As you were involved in getting things to 'export' cleanly, I was wondering if you had any views? ShakespeareFan00 (talk) 10:01, 2 March 2021 (UTC)

Since I know you love writing docs

Latest comment: 3 years ago1 comment1 person in discussion

User:Xover/Template guidelines

Input (very very!) welcome. Feel free to edit directly. Won't be offended if you think any part of it is idiotic (but might of course push back on it since everything I write is obviously the work of genius), nor if you think it's a waste of time (see previous parenthetical).

I'm thinking it can become a guideline-guideline eventually, and something we actively work (long term) on migrating existing stuff to.

And it's meant to be a pretty technical guideline. Target audience is me, you, and anybody else with their fingers in the technical guts. So fair game to talk about p-wrapping, margin collapsing, semantic-ish templates, etc. But sufficiently readable to the community at large that they can look and see that it is good (nodding sagely is adviced). --Xover (talk) 10:46, 2 March 2021 (UTC)

Blanking as 'no text'

Latest comment: 3 years ago2 comments2 people in discussion

I noticed your contribution here. Can you clarify the reason for this? AnotherEditor144 ^{t - c} 12:18, 2 March 2021 (UTC)

@AnotherEditor144: As I mentioned in my edit summary, this is an ex libris sticker, and, since it is not part of the work in question, we do not reproduce it. We also don't reproduce barcode stickers, call number stickers, the card pocket in the back of some library books, library stamps, scribbling by the book owners/borrowers (unless that's a historically useful work in its own right, Fermat's famous marginalia comes to mind), or library or digitization watermarks or registration markers. Basically, if it's not part of the book when published, we don't usually reproduce it. Inductiveload—talk/contribs 12:25, 2 March 2021 (UTC)

Thank you. I thought you might be interested in this.

Machinell translated

Latest comment: 3 years ago6 comments2 people in discussion

@Inductiveload: Hallo Inductiveload, we had discussion because of machine translation. You wrote to me that i have before inserted license template. I did that now but still it was not translated. I do not realize why it was not translated while uploading. Thanks from germany

--Riquix (talk) 06:26, 7 March 2021 (UTC)

@Riquix: Sorry, I don't follow. What's the problem here? The file at Commons has a license template. The problem before was that it did not have a license template. What translation were you expecting to happen? Inductiveload—talk/contribs 09:26, 9 March 2021 (UTC)

@Inductiveload: If the experienced user has been made then all pages translated automatically. Now we have to put every single page with OCR button. Look at any one page here : https://en.wikisource.org/wiki/Index:Essays_On_The_Gita_First_Series_(1922).pdf --Riquix (talk) 10:19, 9 March 2021 (UTC)

Oh I see, this file does not have a text layer. It's also missing several pages. Digital Library of India really do produce some rubbish scans. I'll try to source a better scan from here, but it'll take some time to download and convert. Inductiveload—talk/contribs 10:27, 9 March 2021 (UTC)

@Riquix: try this scan: Index:Essays On The Gita - Ghose - 1922.djvu. It's a different printing (it's the original 1922 Madras edition), but the content should be the same. All pages appear to be present, and I've put in a text layer for you. Inductiveload—talk/contribs 11:34, 9 March 2021 (UTC)

Many thanks !--Riquix (talk) 07:11, 10 March 2021 (UTC)

Woa there, cowboy!

Latest comment: 3 years ago7 comments2 people in discussion

This needs some discussion first! --Xover (talk) 19:10, 10 March 2021 (UTC)

@Xover: There's zero visual difference (it still uses {{new texts/item}}) or any difference in functionality to how it used to be. Except that month-by-month archiving is now completely automatic. Inductiveload—talk/contribs 19:24, 10 March 2021 (UTC)

You're sending end users into Lua code, and completely changing the workflow for new texts. There are technical differences, a completely new and alien syntax, and there's no guarantee those who interact with this workflow are capable or comfortable with the new version. This can't just drop in without warning. I actually also have some technical concerns on my own account, but those are a secondary point. --Xover (talk) 19:33, 10 March 2021 (UTC)

Reverted.

What technical issues did you have in mind? Inductiveload—talk/contribs 19:37, 10 March 2021 (UTC)

Mainly that as a user-facing interface ("config file"), Lua datastructure syntax is fragile: forget a comma, or any one of a billion other obscure technical details, and you start throwing big red error messages and break the whole system (an error in one leaf breaks the whole tree). Compare the old version: each entry is an individual template, where a mistake is usually immediately obvious, and, crucially, cannot break other entries.

Just for contrast, when I've previously toyed with ways to improve that (admittedly rather baroque) process, it's been in the direction of a default-Gadget-provided JS UI to manage the list and possibly JSON storage for the actual data (i.e. explicitly not user-editable). Not as an actual proposed "better" way to do it (I've "toyed with the idea" not actually thought about it), but just to make clear my frame of reference and thinking related to this. --Xover (talk) 19:50, 10 March 2021 (UTC)

@Xover: I thought of that, but MW Lua editor will not let allow you save a syntactically invalid table. The worst you can do is screw up the data itself so badly that it can't even be loaded, which takes some doing (it is possible, e.g. by referencing a global). But nearly all practical ways to do that end up with the same result as now: {{new texts/item}} chokes on it. For example, the most likely syntactically-valid thing a user will do is forget quotes:

		{
	 		title = George,
	 		author = Algernon,
	 		year = 1875
	 	},

In which case it breaks "normally". You can't write title = George Chapman, that won't save, as will forgetting a comma. About the only thing I've managed to find that does actually cause a load failure and is allowed to actually be saved is forgetting a comma and quotes and not putting it in {} (and doesn't check it before saving):

		foobar
        {
            ....

I've also considered a tool like you say but I would bet half a bitcoin that it would become unmaintained, like PageNumbers.js or Match and Split, depending on whether its a local JS tool or Toolforge tool, the day that the inventor gets hit by a bus. Plus it'll be an even bigger PITA if dodgy data sneaks in if the tool doesn't also make it really obvious how to revert it.

Also, arguably, there's not much practical difference between one failed entry because the user missed { in the current system and they exploded the data table. The main page is still busted, probably with a red error in either case, and will be fixed or reverted ASAP by someone. If we were super-serious about avoiding that, we'd detect the error text (or blanking, or whatever) with a bot watching the page and have it insta-revert (or allow only edits to a canary page, and have it mirrored to the real page if it is suitably not on fire). But that applies right now too. Inductiveload—talk/contribs 20:31, 10 March 2021 (UTC)

Oh, interesting. I didn't even consider that the editor might prevent this. Input validation FTW!

But I still think the syntax is more complicated and fragile from a user perspective. The template version is far from great, but we do sort of have to presuppose users to be minimally comfortable with templates to contribute. A complex data structure in Lua is a step too far IMO. Even without the breaks everything vs. breaks this entry issue, the myriad ways this can "not work" feels fragile and obtuse to the user, and especially since just hitting the "Preview" button won't actually work.

But to be clear: I'm not saying we can't go that way, no way no how not never. I'm saying it's a big enough change that we need to let the community decide if they're comfortable with that before implementing it. For my own part I'd much prefer to edit the Lua config to MW templates (did I mention I hate templates?), so it's no skin off my own back. Looking the same on the main page or ugly error message ditto aren't really main concerns for me: error messages are good so we discover and can fix issues, and the main page is way overdue for some tweaks. It's the user experience that concerns me. But it's entirely possible my concerns are overblown. --Xover (talk) 16:50, 14 March 2021 (UTC)

T277451

Latest comment: 3 years ago6 comments2 people in discussion

I knew I wasn't crazy! :)

But that's just the pagenum span. While you were digging around, you didn't happen to find code to suppress the transclusion of the actual page content. Cf. eg. this sandbox. If the page content is transcluded, it's arguable that the pagenum should be too, for consistency. There may be legitimate uses for pulling such pages in through PRP (vs. just normal MW transclusion like you'd do for the ToC on an Index:), but in my experience these always end up as exclude=pp. in the <pages … /> tag. --Xover (talk) 08:29, 16 March 2021 (UTC)

@Xover: I know what you mean, but I was more about fixing the issue in the code as written, since that was 1) easy, 2) fixes our issues where the W/T number merks the one you want and 3) won't really change on-wiki behaviour (I have no idea if people are using W/T pages for weird and nefarious purposes on other Wikisourcen). To do what you suggest would be to stuff all the transclusion code into the if statement:

                if ( $qualityLevel !== PageLevel::WITHOUT_TEXT ) {
                    $pagenum = $pageNumber->getRawPageNumber( $language );
                    $formattedNum = $pageNumber->getFormattedPageNumber( $language );
                    $out .= '<span>{{:MediaWiki:Proofreadpage_pagenum_template|page=' . $text .
                        "|num=$pagenum|formatted=$formattedNum}}</span>";

                    if ( $from_page !== null && $page->equals( $from_page ) && $fromsection !== null ) {
                        $ts = '';
                        // Check if it is single page transclusion
                        if ( $to_page !== null && $page->equals( $to_page ) && $tosection !== null ) {
                            $ts = $tosection;
                        }
                        $out .= '{{#lst:' . $text . '|' . $fromsection . '|' . $ts . '}}';
                    } elseif ( $to_page !== null && $page->equals( $to_page ) && $tosection !== null ) {
                        $out .= '{{#lst:' . $text . '||' . $tosection . '}}';
                    } elseif ( $onlysection !== null ) {
                        $out .= '{{#lst:' . $text . '|' . $onlysection . '}}';
                    } else {
                        $out .= '{{:' . $text . '}}';
                    }
                    $out .= $placeholder;
                }

Inductiveload—talk/contribs 08:36, 16 March 2021 (UTC)

Yeah, that's why I wondered if there was existing code for it that was just broken like the pagenum stuff. A bug would presumably be easy to fix, but a change that affects on-wiki behaviour would at the very least require research, and worst case also community consultation (which I don't think any of the usual suspects have the capacity for these days). --Xover (talk) 08:51, 16 March 2021 (UTC)

AFAICT, there isn't broken code for it, it's "supposed" to be like that. But I would say that the logic of suppressing the page number and inter-page separator but not the content is flawed. It's just on wikis (all of them?) where those pages are usually completely empty, there's no visible difference, so no one has cared (or they have assumed it is intended and excluded them manually when needed.

I'll file an issue and patch (after a conflicting patch has gone in) and we'll see if anyone screams. If nothing else, there'll be a task+patch in the system for future reference. Inductiveload—talk/contribs 08:56, 16 March 2021 (UTC)

Oh, that reminds me… We should have a cleaned up version of User:Xover/notext.js as a default gadget. I think maybe I saw you had something similar sitting somewhere? In any case, it should empty the text fields when "Without text" is chosen, and restore it when choosing something else (in case the user just misclicked). I'll get around to fixing it up eventually, but feel free to jump ahead if you want. --Xover (talk) 09:36, 16 March 2021 (UTC)

@Xover: I have the Index preview grid, developed for a driven-off contributor, which provides "set as empty" in the alt-click pop-up: User:Inductiveload/index_preview, but nothing like that in edit mode.

Messing with the textbox is frustrating with JS because it kills the undo buffer (at least it does for me in both Firefox and Chrome), so restoring the text after a JS intervention isn't trivial (this is incredibly annoying sometimes). So it'll need special handling other than just Ctrl-Z on the text-box. Inductiveload—talk/contribs 10:09, 16 March 2021 (UTC)

Lua table length

Latest comment: 3 years ago4 comments2 people in discussion

Incidentally, since I saw a comment of yours somewhere… Lua table length can be counted with # iff its keys are monotonically increasing integers starting at 1. It's one of the quirks of Lua's design that makes absolutely no sense to anyone. --Xover (talk) 08:33, 18 March 2021 (UTC)

@Xover: that's what I thought, but the table that mw.loadData() provides has metamethods that mean #foo, annoyingly, does not work on them (cf. the last bullet here. My workaround was a simple counter loop, which is brutalist but effective. Inductiveload—talk/contribs 09:21, 18 March 2021 (UTC)

Ah, yes, metamethods… How do I loathe thee? Let me count the ways… --Xover (talk) 09:33, 18 March 2021 (UTC)

Remember you can't use #metamethod_loathings :-D Inductiveload—talk/contribs 09:34, 18 March 2021 (UTC)

Template:Kut-eng and Template:Kutenai

Latest comment: 3 years ago2 comments2 people in discussion

Found this situation when lint error hunting.. Given the large number of templates here I am wondering if what's needed is a module based engine where each pair is placed on it's own line? ShakespeareFan00 (talk) 10:21, 18 March 2021 (UTC)

This should probably be ruby:

Example

<ruby style="ruby-position:under;">
<rb>taʹx̣as</rb>
<rp>(</rp><rt>Then</rt><rp>)</rp>
</ruby>

taʹx̣as (Then)

Inductiveload—talk/contribs 12:54, 18 March 2021 (UTC)

An em is not an em

Latest comment: 3 years ago6 comments2 people in discussion

cf. sandbox.

So it looks like &emsp; is not actually 1em wide, which makes trying to align other bits with :-indented lines inside {{ppoem}} rather a challenge. See example (the drop-initial makes everything complicated, and here it has to be manually shifted due to indented lines).

I would suggest mimicking {{gap}}'s approach to get a predictable width, but possibly using &emsp; instead of WORD JOINER inside the span (so the semantics match the context).

PS. I think possibly the drop-initial would have been easier here if I could have disabled the hanging indent on the first line, since both rely on text-indent/margin for their effect. Might be worth considering as magic ppoem-syntax at some point. --Xover (talk) 09:54, 20 March 2021 (UTC)

@Xover: Re emsp: that probably makes sense.

Re the dropinitials: these have been a "bit" of a pain. I think you actually generally want to keep the hanging indent even after a DI if possible, because you still need it to show the next line is a continuation of the first, not a new line (normally, printed matter still keeps the hanging indent for the first line). However, CSS seems awfully reluctant to allow that without prematurely wrapping the first line. Setting the first line's width to > 100% works but you need to know the extra to add, which you obviously do not know in general. Cogitation continues.

We might also want to consider a syntax to allow a ppoem-line to acquire a class (and, maybe, style), somewhat like tables do for cells. Probably not for common use, but would add flexibility. Inductiveload—talk/contribs 11:32, 20 March 2021 (UTC)

Dropcaps are… yeah. Thanks to having to fake them with floated content we're always going to be in a world of pain there. I have wondered though, if we could fix the premature line wrapping by adding Yet Another fake span, z-indexed to the back and turned functionally invisible, but positioned such that it lets the browser calculate the actual width of the box. I think this behaviour might actually be a hole (not a bug, per se, but an unintended corner case) in the CSS box model. Possibly it's that white-space needs, in addition to normal and pre-wrap, a please don't ever wrap this line unless you really physically have to, and calculate its width accordingly. In any case, yeah, preferably preserve hanging indent, but I'd probably trade it for being able to achieve other things if forced to choose.

I've also loosely wondered if—since you're already parsing wikicode here—we could let templates like {{di}} to signal their presence to obviate the need for manually adding constructs like {{di|A}} << HOY!. Anything from stripcodes to HTML classes could conceivably work there, and could be emitted either always or by explicit request along the lines of {{di|A|mode=ppoem}}.

A class on lines is probably a good idea, but may want to wait for a clear need (per-work CSS may generate that) to avoid having too much syntax with too obscure use cases. Inline style is an emergency solution, so I'd urge a very clear need before implementing that. Could predefined styles cover 80% of the need reasonably? Is there an acceptable manual fallback for the remaining 20%? If so I would argue we could avoid inline styles altogether. --Xover (talk) 18:09, 20 March 2021 (UTC)

@Xover: Dropcaps are… yeah...yeah

since you're already parsing wikicode here I'm not really, though. I could inspect the content for something like {{(di|dropinitial)\| but that's a rather uncomfortable coupling between the templates. I'd almost rather go for a DI syntax within ppoem.

we could avoid inline styles altogether it's fairly likely that per-work CSS will permit this, as long as you can apply classes to lines.

Inductiveload—talk/contribs 13:22, 21 March 2021 (UTC)

DI syntax: I wasn't really thinking about looking for the template invocation (regex parsers suck), although if anything would merit such a hack it'd be {{di}}. I was thinking more along the lines of making any template that needed it emit a control code (strip markers spring to mind, but I'm sure there are other ways that would work). But I haven't really looked at how you're doing the magic syntax here, so I don't know what'd be a non-yucky way to do it. The amount of magic syntax so far seems to sit pretty solidly in the Goldilocks zone (I'm amazed at the sheer number of pages I've been able to do with essentially only ppoem!), but adding too much more is, IMO, rapidly taking it into the "unwieldy monster" end of the scale. --Xover (talk) 15:02, 21 March 2021 (UTC)

@Xover: glad to hear about the magic syntax. I'm also cautious about taking it too far into "black magic" territory (also I don't want to end up badly re-writing a LALR parser or something as a module to parse a mini-language and/or ending up with {{TOCstyle}}).

Because this is a template, it only sees incoming {{di}}'s as, literal "{{di|...}}" and the module spits the Wikicode out, verbatim, as it comes. They expand later in the parser, not in the module. In theory, you could capture them and expand in the module with frame.expandTemplate but that would be a very last resort, IMO. Inductiveload—talk/contribs 16:47, 21 March 2021 (UTC)

Crossing pages is the cross we bear

Latest comment: 3 years ago7 comments2 people in discussion

Any thoughts on handling lines that cross pages inside a ppoem run?

The case is actually a play where everything is spoken in verse, but stage directions show up as a long unwrapped "line" that can span across page boundaries. I can only come up with moving text between pages (which pains my obsessive streak) or splitting it out of the ppoem block (which has the same alignment problems old poem has). I could also abuse hws/hwe I suppose, but for about a short paragraph's worth of text that feels… icky.

Any clever solutions I haven't thought of? I don't think this can be fully solved without a new start/end model and making ppoem an extension (either part of or at least tightly integrated with PRP), can it? --Xover (talk) 11:12, 21 March 2021 (UTC)

Oh, and the example is at Page:War, the Liberator (1918).djvu/92 and /93. I ended up moving the text to the latter page. But I don't have to like it! :) --Xover (talk) 12:42, 21 March 2021 (UTC)

@Xover: I think we can get away with just a new start/end "no-line-break" model and just omit the span-close and open on transclude. The pagenum span will occur half-way though the line, but that's fine, I think (or even preferable).

I'm open to extension-ification, but I'd rather get a template working and then upgrade, or we'll be here all decade. Inductiveload—talk/contribs 13:26, 21 March 2021 (UTC)

Oh, you're right and I'm just being dumb; I wasn't thinking across namespaces at all. Yeah, that would presumably work (unless the parser steps in and p-wraps it into oblivion of course). And I'm in no hurry for extensionification either; I'm by no means certain the benefits (which may only be the ability to do a /s+/e model) will outweigh the costs. My playing with it so far suggests it works astonishingly well. I'll try to cobble together something resembling structured feedback when I'm done playing. --Xover (talk) 14:54, 21 March 2021 (UTC)

@Xover: This appears to be working now with start/end set to same-line. Open to suggestions if that is not a clear value. Inductiveload—talk/contribs 12:26, 22 March 2021 (UTC)

The page joining indeed seems to work. In re the value, the keyword "continue" pops up in my head, but I'm not sure it makes any kind of sense. --Xover (talk) 18:06, 22 March 2021 (UTC)

@Xover: "Continue" was in my head too, but we need to make sure this and "follow" aren't too confusable. Unless follow is "no-stanza"/"same-stanza" or something? Inductiveload—talk/contribs 18:16, 22 March 2021 (UTC)

Large initial

Latest comment: 3 years ago3 comments2 people in discussion

Hello. I have notice that you added the image parameter to the template {{dropinitial}}. Would it be possible to do the same with {{largeinitial}} too? I tried to add the image simply into the first parameter of the template, but for some reason it does not keep the baseline, see Page:Czechoslovakia's tribute to the memory of Woodrow Wilson.djvu/11. What do you think? --Jan Kameníček (talk) 22:11, 22 March 2021 (UTC)

@Jan.Kamenicek: I added the parameter and it seems to work OK. :-) Inductiveload—talk/contribs 22:50, 22 March 2021 (UTC)

Absolutely perfect! Thanks very much. --Jan Kameníček (talk) 00:15, 23 March 2021 (UTC)

Finally caught a break...

Latest comment: 3 years ago14 comments2 people in discussion

...or something that broke, in any case. :)

See War, the Liberator, and Other Pieces/My Old Grenade (transcluded from pp. 128 and 129).

My immediate guess is that nesting ppoems will generally break in similar ways without special params to handle it. But it's possible this particular formatting would be better done with either an embedded {{center block}}, or possibly explicit magic syntax for the stanza (e.g. "the following stanza is a centered block within the overall width of the poem block").

But on the plus side, this is the first case that seems to break in about a hundred pages of ppoem use (variability low to medium, so not a real torture test; but fairly representative). --Xover (talk) 09:06, 23 March 2021 (UTC)

Hmm, this is probably a case for phab:T276681 (per-stanza classing), possibly with a built-in class for block-centred. Inductiveload—talk/contribs 09:12, 23 March 2021 (UTC)

Yeah. The following poem also has what looks like a stanza block-aligned to the right that I predict similar problems will accrue to. --Xover (talk) 09:17, 23 March 2021 (UTC)

The latest edits broke btw.

A << B and the rest of the alphabet.

Just in case you hadn't already caught it yourself. :) --Xover (talk) 12:40, 23 March 2021 (UTC)

@Xover: sorry, I didn't mean to break it while I was faffing! Anyway, I have something like stanza styles working now (plus a tidier module with...tests!).

The margins are still causing a few issues with the block centre's exact alignment, but the general idea and syntax is there, at least. Inductiveload—talk/contribs 13:49, 23 March 2021 (UTC)

Awesome! Page:War, the Liberator (1918).djvu/134 is probably a good in-the-wild example of the margins not quite lining up (well, either that or my eyes are going crosswise). No worries about the breakage: this is experimental use.

Oh, hmm. Why is the example above wrapping now? --Xover (talk) 14:24, 23 March 2021 (UTC)

@Xover: where is it wrapping? Bombers and Grenade are both unwrapped on my screen.

In general, DIs may well cause a spurious wrap specifically if the lines next to them are within 4em of the longest in the whole poem, or if the DI itself is bigger than 4em (cf. last example in the docs). Inductiveload—talk/contribs 14:50, 23 March 2021 (UTC)

In this example…

A << B and the rest of the alphabet.

…the A is above the B on my end. --Xover (talk) 14:52, 23 March 2021 (UTC)

@Xover: Ah, that's because A is not floated left like a DI would be. The lines are display:block (for the moment) to allow them to have a hanging indent each (without being paragraphs) Inductiveload—talk/contribs 15:00, 23 March 2021 (UTC)

Ah, yes. I see now. Thanks. --Xover (talk) 15:02, 23 March 2021 (UTC)

@Xover: btw, the :::: syntax now produces a {{gap}}-like fixed-width span of that many 'em', containing that many emsp's for copy-paste. Inductiveload—talk/contribs 16:07, 23 March 2021 (UTC)

Oh, excellent! I just had occasion to test with a single em and it worked great. PS. Also ran across some possible pre-defined classes for poem (and possibly stanza) here. --Xover (talk) 19:14, 23 March 2021 (UTC)

@Xover: Hmm, how do you think this should work? Provide a handful of built-in class names via Template:Ppoem/styles.css (say one for text-align:center/right, smaller, and fine)? I think the only one that makes sense to be magic is text-align center/right, but then what's the syntax? Inductiveload—talk/contribs 19:22, 23 March 2021 (UTC)

For this case I was thinking pre-defined classes in styles.css. I don't immediately see any need for magic syntax for this. Based on the use case I ran into I was thinking maybe the standard bunch of size classes, and left/center/right/justify, and a couple of line-height values to cover common cases. And I was thinking mostly in terms of stuff that could be cleanly implemented as classes (i.e. nothing like {{di}} here). But maybe I haven't thought it through enough? --Xover (talk) 19:37, 23 March 2021 (UTC)

The other initial has dropped!

Latest comment: 3 years ago4 comments2 people in discussion

“OR rather, the quotation mark has escaped.” :) --Xover (talk) 15:13, 23 March 2021 (UTC)

ARGH! Inductiveload—talk/contribs 15:14, 23 March 2021 (UTC)

This should be fixed now (the floated thing floats too). Inductiveload—talk/contribs 09:38, 24 March 2021 (UTC)

Yeah, verified. Thanks! --Xover (talk) 10:51, 24 March 2021 (UTC)

Block based templates

Latest comment: 3 years ago7 comments2 people in discussion

Ppoem's docs probably needs a separate caution about block (div/p) based templates inside stanzas. And depending on how many of those crop up regularly in poems we may want to maintain a list of compatible alternatives. The one I ran into was {{

- - }} (which screams to be migrated to Lua in any case), but I've worked around it manually for now until I see how often it crops up. --Xover (talk) 07:16, 24 March 2021 (UTC)

@Xover: Hmm, so I think there are a few options here, in no particular order (and with minimal thought given so far), and can be mixed and matched:

Provide wholesale alternative templates e.g. {{*** inline}} (but we have enough templates)
Give {{***}} and friends an inline parameter (but we don't really want to add more gotchas to ppoem)
Convert {{***}} and friends to <span style="display:block;"> (semantically this might even not be the worst idea ever - are they really divs (division/section) anyway?)

Adding a magic syntax to ppoem to make a line a div isn't an option because stanzas are (rightfully) p's, so you can't have divs or other p's in there. Inductiveload—talk/contribs 09:38, 24 March 2021 (UTC)

I'm not sold on the argument that much of anything is actually properly a p, but be that as it may…

I think implementation should be a bit mix and match, unless something points itself out as the One True Way. The key point there is to document the common gotchas, and anywhere we can reduce the need for docs (for reading them, not so much the writing of them) by making the relevant templates intuitive or "just work" the better.

I don't generally think converting div to span with display:block has much point (it's a distinction without difference in html5; we mostly only notice due to p-wrapping and other parser messes). Better would be a move away from 1= templates and towards /s+/e templates for blocks, to make clear what the model is. For users it would also be more intuitive (less cognitive load) even with more templates, because the inline and block versions would (if done right, and in most-but-not-all cases) would appear to be one template and how to call it given by the context.

And I'm generally on the wait and see train on this, until we see how many of these there are. The only reason I'm pointing a gun at {{

- - }} is that internally it uses {{loop}} to do its thing, and that one seriously needs to be killed with fire. Since it needs to be gutted anyway we might as well find some way to make it work well in ppoem too. --Xover (talk) 10:50, 24 March 2021 (UTC)

@Xover: I'm not sold on the argument that much of anything is actually properly a p How do you mean? Make stanzas a div? That would definitely work and allow block children. But a stanza certainly feels like a p (and as we all know, feels > reals), though since they have a class anyway it makes little practical difference.

I don't generally think converting div to span with display:block has much point indeed, it's almost sophistry, but, at least for {{***}} (and a few allied things), it's not really a div. In fact it's semantically possibly more like an HR element. Which might be possible with per-work CSS (it needs :after selectors and won't copy-paste), but won't be practical with templates.

some way to make it work well in ppoem too - some way to use span instead of div would be easy even as it is, but I haven't thought about it enough to say if that's a good idea.

wait and see train on this agreed. Inductiveload—talk/contribs 11:16, 24 March 2021 (UTC)

p is a brainfart from a bunch of old farts back in the early nineties (yes, TimBL and DanC, I'm talking about you ;)), inherited by the work on GML back in… I actually don't remember when IBM started work on that, and can't be arsed to go look it up… but it was a model of what HTML would be used for that was very coloured by the fact that the web did not exist. It reflects the idea that HTML is primarily a way to mark up documents—and I mean double-spaced, typewritten, stacks of dead trees here—and needs to be able to express all the customary parts of such artefacts. It's from a conception of HTML as "a simpler SGML with active hyperlinks", rather than the foundation of the modern web (the majority of which much more resembles an application than a document). When div and span were introduced in HTML 4.01 (well, really in HTML 3.2+ iirc, but that's a different story) it reflected amassed experience that a paragraph tag is way too narrow and limiting for the kinds of things the web needs, and brings along too many assumptions from the stacks of dead trees. In that old conception of the web, p would actually be incorrect here, because a stanza should have its own tag. In the new conception of the web a stanza, like a paragraph (in most contexts), is just another kind of logical grouping of a page (i.e. a div) to which more specific semantics can be added with a microformat. On enWS the only real proper use for p would be marking up the paragraphs in the works we reproduce, but that would only make sense if we could control it directly (the parser's p-wrapping has way too many problems).

In any case… I can't think of a single use case here where p would be appropriate and where it wouldn't cause far more problems than it was worth. We inherit some default styling in both UAs and MW for div too, but overall it has far fewer problems. Not that, you know, it's a pet peeve or anything… :) --Xover (talk) 12:02, 24 March 2021 (UTC)

@Xover: Well it sounds like you know this in more detail than I do! Perhaps I've overestimated the usefulness of p based on how pervasive it is. So should we make stanza a div (it's a trivial thing to do)? None of the CSS should be affected, at least. Inductiveload—talk/contribs 12:09, 24 March 2021 (UTC)

Bah. I'm mostly just curmudgeoning, and mostly due to T253072. If it's trivial I'd argue lamely in favour of switching, otherwise I'm just venting. The main reason for switching is that div inherits fewer pre-defined behaviours and styles, but on the other hand it works just fine with p so switching might necessitate explicitly specifying some of that stuff. --Xover (talk) 14:13, 24 March 2021 (UTC)

Ppoem, the Liberator

Latest comment: 3 years ago6 comments2 people in discussion

Ok, I've now finished War, the Liberator, and Other Pieces. I did half of it using {{sbs}} and then switched to {{ppoem}}. And finally I went back and converted the {{sbs}} uses to {{ppoem}}, swapped out {{em}} and {{gap}} with : and ::, and other features and fixes you added as I was working.

The experience so far is that, modulo the issues you address along the way, ppoem works great, is rock solid, is intuitive to use (modulo learning curve, caveat custom syntax and model), solved all the needed problems for this work, and despite the oddball and kinda fragile start/stop hinting there were essentially no problems on transclusion (which is more than I can say for any of the other ways we deal with page-crossing!). Converting sbs to ppoem was straightforward (which I think means they actually share far more in philosophy than is immediately obvious), and resulted in immediate improvements: simpler (and less) markup, far fewer visual glitches, and just overall better in all ways. Compared to all other ways to format poems (and similar constructs) ppoem is much better in every way.

The drawbacks and annoyances are the custom syntax that is very different from everything else we use, and can reuse very little existing acquired knowledge; keeping track of what kind of start and end model to use is a bit taxing, and will be moderately difficult for many contributors; and the need to have special syntax (<<) after a dropcap is not at all intuitive and so resists establishing muscle memory (I kept forgetting, even after I figured out why it was needed).

I think the first issue probably argues that we should be completely uncompromising in keeping the custom syntax small and internally consistent (no special-casing, no solving every need that pops up using new magic). The second will need some noodling, but can probably be alleviated somewhat by using good keywords that people intuitively understand, possibly encouraging putting |end= at the actual end of the template call, and maybe even some fancy GUI tool to manage that aspect (or possibly just something EasyLST-ish). The last issue we've discussed elsewhere so I won't go into it here, but it's sufficiently annoying that it's worth a couple more cycles coming up with possible DWIM solutions.

All in all I'd say ppoem as it stands is already a massive success, and but for the need for much more testing on real world data and some caution around the custom syntax / model, I'd say it could with great benefit have been pushed to the community already. I'm going to go back over some of my existing works that use {{sbs}} and convert them to see if there are any more gotchas to be found, and because I'm already convinced that ppoem means sbs can no longer justify its existence (pre-wrap/poem formatting was its raison d'être; ppoem does that far better, and even though sbs has other functions they probably don't justify its existence alone). --Xover (talk) 11:27, 24 March 2021 (UTC)

Thank you for the vote of confidence. Except for the CSS finagling and the DIs, I'm fairly happy with the outcome so far, too. :-)

I'm still thinking about the DI stuff. I think there might be better ways to handle it, but I haven't gotten it just yet. Ideally, the "<<" could just die one day.

Re the start end model, the only thing I can really think of other than your suggestions is "moar magic" with another magic syntax on the first/last lines like "=> same-line" at the start and "stanza-break =>" at the end, but I'm far from convinced that that's actually a reduction in cognitive load. At least parameters are familiar and explicit (and can be picked out by existing tools like mwparserfromhell).

Re completely uncompromising in keeping the custom syntax small and internally consistent absolutely, I do not want this to be {{ts}} 2.0. Inductiveload—talk/contribs 11:39, 24 March 2021 (UTC)

On a completely unrelated note… Does the 4em right margin need to be on the stanza (vs. the lines)? It seems to be the main culprit in making centered stuff fail to align sensibly (by pushing all the contained lines too far left relative to centered stuff outside the ppoem). The left margin for the hanging indent that the right margin is, I think, compensating for is attached to the lines, so my immediate assumption would be that the right margin should be too. --Xover (talk) 18:50, 24 March 2021 (UTC)

@Xover: this is indeed part of what is messing with the alignment. The right margin is actually there to make sure that the lines, which are 100% wide (of the stanza) don't lose their right-hand 4em on a small screen. Although they are 100% wide, they're also 4em to the right, so they "stick out" of small containers.

And why the merry Felicity is there a width:100% there anyway, I hear you ask? That's because if you don't force the line to be as wide as the stanza, a drop initial will definitely wrap its line (whereas with 100%, it will only wrap if the line is within 4em of the longest in the whole poem).

I'm sure there's a better way somewhere, but this way is the best I could figure out with my fried brain. Inductiveload—talk/contribs 19:01, 24 March 2021 (UTC)

Would width: calc(100% - 4em); do it? --Xover (talk) 20:31, 24 March 2021 (UTC)

No, because you actually want the 4em to stop the wrapping with (smaller) drop initials. But I am not sure the cure isn't actually worse than the ailment here. Inductiveload—talk/contribs 21:16, 24 March 2021 (UTC)

Dropped initial

Latest comment: 3 years ago6 comments2 people in discussion

Hello. It seems that after your edits of the template {{Dropinitial}} the top margin parameter stopped working (I did not check the other margin parameters), see User:Jan.Kamenicek/Sandbox. Can you also have a look at it, please? --Jan Kameníček (talk) 11:00, 23 March 2021 (UTC)

@Jan.Kamenicek: Ugh, yes, dagnabbit! I didn't notice that at the time. I guess my eyes are so far out of calibration that I missed that 0.1em in the docs! Thanks for the report - I think this should fix it. Inductiveload—talk/contribs 11:48, 23 March 2021 (UTC)

Great, now it looks to work as expected, thanks. --Jan Kameníček (talk) 13:06, 23 March 2021 (UTC)

The bot sets the alt parameter e. g. by alt=G instead of the usual alt=G, see e. g. here. Is it intended? If so, is there any advantage? Imo it makes it less comprehensible. --Jan Kameníček (talk) 18:20, 25 March 2021 (UTC)

Darn, I thought I'd caught all those. That was a trip-up in mwparserfromhell because the alt parameter was being set in the {{di}} template, not in the image call. {{di}} doesn't have an alt parameter at all (it's simply parameter 1), so setting the alt parameter there did nothing at all. Inductiveload—talk/contribs 18:35, 25 March 2021 (UTC)

I see, thanks for explanation. --Jan Kameníček (talk) 21:20, 25 March 2021 (UTC)

Table across two pages

Latest comment: 3 years ago6 comments2 people in discussion

Hello! I saw your Help:Page_breaks#Tables_across_page_breaks and need some help with my pages. I tried to do the poem which runs from Page:An Exposition of the Old and New Testament (1828) vol 3.djvu/28 into Page:An Exposition of the Old and New Testament (1828) vol 3.djvu/29 in the same format as I had done the one at Page:An Exposition of the Old and New Testament (1828) vol 3.djvu/21. I've tried to follow your instructions, but the eight lines of the poem (which are not two separate stanzas) do not line up at An Exposition of the Old and New Testament (1828)/Job. Can you help? : PeterR2 (talk) 10:20, 31 March 2021 (UTC)

@PeterR2: This should not use a table at all. The (current) easiest way to deal with this is to use {{block center/s}} and <poem>:

Page 1

{{block center/s}}
<poem>
Line 3
Line 4
</poem>
----------------------↑ Body/footer ↓<
{{block center/e}}

Page 2:

{{block center/s}}
----------------------↑ Header/body ↓<
<poem>
Line 3
Line 4
</poem>
{{block center/e}}

Using tables for poems is bad for all sorts of reasons. There is work underway to improve poems, but for now, something like the above works fine. See H:POEM for more information.Inductiveload—talk/contribs 10:37, 31 March 2021 (UTC)

Thank you! I will try and copy this on the other poem earlier in the volume. I notice that page numbers don't usually cause a break in An Exposition of the Old and New Testament (1828)/Job - is there any way of having the whole eight lines run smoothly without a gap? : PeterR2 (talk) 10:42, 31 March 2021 (UTC)

@PeterR2: I don't see a gap between the halves of the page: https://ibb.co/Hpxbp0F. What do you currently see? You might need to purge the page. Inductiveload—talk/contribs 10:46, 31 March 2021 (UTC)

This in Vivaldi: https://ibb.co/wK3gt3Q - but yes it looks fine in Firefox and Chrome so don't worry! : PeterR2 (talk) 10:53, 31 March 2021 (UTC)

That's odd: I see this is Vivaldi: https://ibb.co/PWQmc92. And since Vivaldi uses the same engine as Chrome, I wouldn't expect it to behave differently between the two. Are you sure you don't have some kind of customisation setting a margin somewhere? Inductiveload—talk/contribs 11:04, 31 March 2021 (UTC)

Module:New texts/data

Latest comment: 3 years ago10 comments2 people in discussion

Is this going to be a manual process and reliant on you? If so, that doesn't seem sustainable nor reliable. I would have thought that we could be getting user:Wikisource-bot to do that more reliably. I would have also thought that we could be doing something setting something to be a json page via Special:ChangeContentModel somewhere . I could be dreaming though if we could have something that will scrape an archived line in "new texts" poke into a json file and then remove the line. If we set up the schema, I am happy to go back and convert all the previous years into json pages. The one thing that we will need to allow is updated for disambiguation and moves. — billinghurst sDrewth 00:17, 18 March 2021 (UTC)

While on that, I am guessing that module: ns is not the long term living habitat for the data. Plus if we are recording json data like this, I would love if we could allow to record WD item data number. I think that there is long term value, and we should be able to run queries and bots, and maybe help twittify stuff more readily. — billinghurst sDrewth 00:22, 18 March 2021 (UTC)

@Billinghurst: the idea was to transition entirely to the module, since {{#invoke:New texts|new_texts|limit=7}} provides exactly what you need. Then, there's no archiving needed at all, except for at the end of a year, when you just copy the whole durn thing to the year's archive page. Template:New texts/testcases provides a comparison.

I have (finally) figured out how to inhale JSON. So now the data lives at Template:New texts/data.json, and archived data would be at Template:New texts/data/2020.json etc.

Re The one thing that we will need to allow is updated for disambiguation and moves. this is the same as currently - update the link target (i.e. the title value).

Re Wikidata, it's certainly possible to link to WD. However, due to the number of items, you will not be able to use it to construct pages like Wikisource:Works/2021 if they have more that about 400 entries (and we're on track to break that limit), because that's the limit on how many WD items a single page can load. So we can't fall back entirely onto a list of only Q-numbers, awesome though that would be. With PWB, at least, it's trivial to get the Q-number for a page given the title value, so you can work backwards. Inductiveload—talk/contribs 10:26, 18 March 2021 (UTC)

Ignorant question. Does the tabular json have any benefit for us mw:Help:Tabular Data as I see it is an available content type. I don't trust users to edit json files, though I wonder at there ability to edit a table. — billinghurst sDrewth 05:01, 13 April 2021 (UTC)

@Billinghurst: I can't get it to work here: do we have the Data namespace turned on? Tables would be useful for lots of stuff (e.g. volume data that Wikidata doesn't seem bothered about). Inductiveload—talk/contribs 21:23, 13 April 2021 (UTC)

No idea. I was prodding people in irc #mediawiki though no one took the bait. I will prod some other avenues, though might be a "meh". — billinghurst sDrewth 23:09, 13 April 2021 (UTC)

In relation to you periodicals issue, there are a number of library cataloguers at Wikidata, it is just a matter of finding one for the help. I saw someone in the past couple of weeks but for the life of me cannot remember where. You may find someone through Wikidata:Status updates or you could try emailing someone like Ruth and asking for help, or being ask for someone as I am sure that they are a tight community. Let me prod someone on twitter to see if they can point someone to answer the question. — billinghurst sDrewth 23:31, 13 April 2021 (UTC)

https://twitter.com/billinghurstwik/status/1382114885786497025 — billinghurst sDrewth 07:46, 14 April 2021 (UTC)

@Billinghurst: Well I'm suppose I am pleased that I'm not the only one who finds the ontology tricksy. Inductiveload—talk/contribs 08:02, 14 April 2021 (UTC)

Found them, so pushing from that direction d:User_talk:UWashPrincipalCataloger#May_I_borrow_your_time? — billinghurst sDrewth 23:50, 14 April 2021 (UTC)

Wikisource:Works/2021

Latest comment: 3 years ago2 comments2 people in discussion

January starts at 2., then February resets and starts at 3., March resets and starts at 4. Novel, though I am not sure that it is the objective. Can we get the count firstly start at 1 ,and then continue in the next month? Thanks. — billinghurst sDrewth 13:03, 2 April 2021 (UTC)

Done Inductiveload—talk/contribs 15:59, 2 April 2021 (UTC)

Problem with refilling Index Page

Latest comment: 3 years ago6 comments2 people in discussion

Just tried your awesome new Refill Index Page link and I got a bit of an oddity. It says [[Author:H. G. Wells|Author:H. G. Wells]]. Shouldn't it be [[Author:H. G. Wells|H. G. Wells]].

Also the Volume is coming out as [[/14|14]] leading to Index:The Works of H G Wells Volume 14.pdf/Volume 14. Shouldn't it be [[%series title%/14|14]]. This is probably a good way of thinking about title vs series title. %title% should produce [[%title%]], Volume %volume% while %series title% should produce [[%series title%/%volume%|%series title% (Volume %volume%)]]

Finally, the edit text says

"You are editing in the Index namespace, see Editing help and Index pages

This page includes a form for entering details about a work. There is a gadget to auto-populate fields from the File: at Commons:."

Shouldn't it say

"You are editing in the Index namespace, see Editing help and Index pages

This page includes a form for entering details about a work. There is a gadget to auto-populate fields from the File:%link to file at commons% at Commons:." Languageseeker (talk) 13:03, 4 April 2021 (UTC)

@Languageseeker: For me, it fills the author as [[Author:H. G. Wells|]], which comes out correct (it's called the Pipe trick).

As I have said already, you should set the title at the Commons page, not the series. The series does not import to the Index page and you have not set any title at all at Commons. So it imports nothing. We do not always place works in a series as subpages of the series, because "series" can be quite a nebulous concept and works within a "series" may well be full-on works in their own right (as opposed to volumes of a single work). For example, The Garden of Eden is part of the New-Church Popular Series but it is a top-level book in its own right.

I have modified the edit notice for Index pages (I actually have that turned off in my CSS to save space, so I haven't seen it for some time!) Inductiveload—talk/contribs 13:32, 4 April 2021 (UTC)

Index:The_Works_of_H_G_Wells_Volume_14.pdf Looks better, but I'm still getting the problem with the Author. Also, right now, we have two different links for the transcluded text: ''[[The works of H. G. Wells]]'' and [[The works of H. G. Wells/Volume 14|Volume 14]]. Shouldn't we just keep the second?

Also, the links does not go to Commons directly. Could it go to Common directly? Languageseeker (talk) 14:00, 4 April 2021 (UTC)

@Languageseeker: OK, that was bizarre, the JS was being pre-processed with the Pipe trick. Should work now.

The link now goes directly to Commons.

We usually have a link to the top of the work (in this case, it would be a volume list of the 28 volumes) and a link directly to the volume transclusion. E.g. Index:EB1911_-_Volume_01.djvu. Inductiveload—talk/contribs 14:34, 4 April 2021 (UTC)

Works beautifully. Thanks Languageseeker (talk) 14:45, 4 April 2021 (UTC)

Done Languageseeker (talk) 15:30, 4 April 2021 (UTC)

Batch Import Periodical or External Scan Link

Latest comment: 3 years ago11 comments3 people in discussion

I've noticed that there are a lot of pages, such as The Atlantic Monthly where there are {{ext scan link}} suggesting that users might find it troublesome to manually import dozens of volumes even if they are able to find the links. Would it be possible to batch import them and set up the template for the volumes? Languageseeker (talk) 13:50, 4 April 2021 (UTC)

It's possible, but gathering all the requisite metadata and makes it a little more labour-intensive than you might think. At the least, you would need, for TAM, for each volume:

The date range: e.g. November 1857 – May 1858
Publication year (e.g. 1858)
IA or HT ID (ideally IA because then you don't need to mess with reconstructing a Hathi scan, which takes a long time)
City (probably always Boston?)
Publisher if known
The license (easy up to 1926)
The Commons category (e.g. commons:Category:The Atlantic Monthly, 1858)

None of it is hard, just a bit of a faff. I use a spreadsheet to generate commons files and Index pages. Then it's just a matter of battling the Commons uploader API which is having a bit of a sulk at the moment. Inductiveload—talk/contribs 14:20, 4 April 2021 (UTC)

Perhaps, we can do something like {{ext scan link|url|desired name}} and pass that off to the IA upload tool? If the file appears on Common, then change to {{small scan link|desired name}}. Then we wouldn't have to worry about guessing the file name, creating the Index Page, or bad index pages. Then this tool would become a sort of auto-filler for the IA tool. It would also be useful in cases when the IA tool failed so that the uploads could be retried automatically. What do you think? Languageseeker (talk)

That would need support from the IA-Upload tool which is in various degrees of broken-down-itude at any given time. A way to prefill with a URL like "https://ia-upload.toolforge.org?id=XXXX&filename=YYYY&ext=pdf" would be handy, for sure, and I have wished for it before (but not hard enough to figure out how to install and hack on the tool....yet). Also with a little bit of care, you can do better than blindly upload from the IA using their terrible metadata, like, for example, putting the dates covered in the description. Inductiveload—talk/contribs 14:41, 4 April 2021 (UTC)

It might be possible to just borrow code from the IA tool because lots of it wouldn't be needed. Basically, get the URL, build the link to the PDF, URl2Commons, import metadata. A new, simplier IA upload tool with no GUI and no user interaction. Languageseeker (talk) 14:53, 4 April 2021 (UTC)

@Languageseeker: It would be a nice trick, and probably not that hard. But it'll still suck up the IA's rubbish metadata and dump the files in a generic category, whereas I think you could ideally do a bit better than that. Not least, with a bit more care, you can create the Index page too, which you cannot do with Url2Commons.

It's on my list to enable a better batch upload system for Wikisource, but the list is long and the days are short. Inductiveload—talk/contribs 14:58, 4 April 2021 (UTC)

I know that you're swamped and there is way too much to do. Would it be possible to set up a vote on feature requests? Maybe one month for proposals and then a ranked vote?

I know that metadata is not the greatest on IA, but it often requires manually editing. No tool will ever fix incorrect or absent metadata. However, trying to upload 95 volumes manually takes hours of just staring while tasks run in the background. It would take less time to batch upload and then fix metadata manually. Maybe, it would be possible to just borrow code from Fæbot? Languageseeker (talk) 15:06, 4 April 2021 (UTC)

IME, it's much quicker to set all the metadata in a big ugly spreadsheet and then upload it all in one go (staring at the terminal is strictly optional). Opening and editing 95 file pages and 95 index pages (or fiddling with a bot to make the changes retroactively) is not very exciting.

I plan to post my uploader script at some point (a point where it isn't full of API keys and imports of random modules from various places). "One day" it might morph into a full-on toolforge tool, but that needs UI and all sorts of bulletproofing.

Voting on feature requests is all very well, but you still need someone to do them. Comm Tech is already being nice to us with the exporter and OCR projects, and there are not many people "into" the JS/Tools side of things. Most of the tools have been rotting for years: half the existing gadgets don't work as promised. Inductiveload—talk/contribs 15:20, 4 April 2021 (UTC)

Uhm. Uploading and adding an index for 95 volumes is one thing. What takes effort is verifying that all pages of all 95 volumes are present and legible. Just mirroring IA is something a bot can do at any time. --Xover (talk) 12:46, 5 April 2021 (UTC)

@Xover: indeed, which I why I'm not on a multi-thousand volume mass upload spree, because locating the good scans and filling in metadata is more work that actually pressing the button on a list of 95 IA IDs and using their awful metadata without even checking.

On the other hand, the IA (or HT) pagelists (e.g. https://pagelister.toolforge.org/, or a local equivalent) help a lot because if you see this:

<pagelist
1="Cover"
2to4="–"
5="Title"
6to9="–"
10=2
781to783="–"
784="Cover"
/>

then you know that volume is very likely complete, because the numbering is continuous right to the end (actually, you normally know that earlier: as you see a non-BW-Google scan). So that helps a bit. Inductiveload—talk/contribs 13:30, 5 April 2021 (UTC)

I’m not advocating on going on a uploading spree, but making uploading faster and easier is something that can benefit users. On one hand, IA and Hathi Trust have many duplicates that we don’t need. Also, nobody needs to proofread a reprint that has no illustrations, author input, or scholarly value. We’re a curated collection not donations in a box. Curation takes time. However, every user has a limited amount of time. Do we want to spend their time doing something that a bot can do or on something that a bot cannot? A bot can upload 95 volumes, but it cannot verify the completeness of a work. With such a bot, I can go to IA, find the best scan, set up the links on the Author page, press save, and the then do something else. In a day, the files will be there and I can proceed to set up the index pages. Languageseeker (talk) 14:55, 5 April 2021 (UTC)

BTW, I’ve seen plenty of incomplete non-Google scans especially from the LOC. Hathi Trust has a feedback button that we can use to report missing pages. Languageseeker (talk) 14:57, 5 April 2021 (UTC)

Media matters

Latest comment: 3 years ago2 comments2 people in discussion

cf. this. It occurs to me that we may well still have files referenced using the Media: prefix. I run into them now and again, so the odds someone stuffed one into {{di}} somewhere is at least non-zero. --Xover (talk) 12:41, 5 April 2021 (UTC)

Done (but AFAICT none found so far), also for Image which indeed had a few. Inductiveload—talk/contribs 07:41, 6 April 2021 (UTC)

{{FIS}} is broken

Latest comment: 3 years ago5 comments2 people in discussion

Hi. The {{FIS}} introduced paragraph breaks before and after the image. I used this template hundreds of time to offset images where the text is supposed to flow around without a break. — Ineuw (talk) 13:58, 5 April 2021 (UTC)

@Ineuw: A link to an affected page would be handy. Inductiveload—talk/contribs 14:04, 5 April 2021 (UTC)

Apologies, Page:The_Rise_of_the_Russian_Jew.djvu/1.— Ineuw (talk) 14:07, 5 April 2021 (UTC)

The problem is {{fs85}} is a block element. FIS is a span, but if you put a div inside, it'll break the surrounding paragraph. Inductiveload—talk/contribs 14:15, 5 April 2021 (UTC)

This is a very belated thank you. I didn't forget, it's just that I was embarrassed about the number of stupid problems requesting help for around that time. Thanks again for all your help.— Ineuw (talk) 01:47, 11 April 2021 (UTC)

Script Request... Recent Activity warning...

Latest comment: 3 years ago2 comments2 people in discussion

Hi.

Would it be possible to have a script that adds as bar to the top of pages, that indicates a page has been edited recently, or that there is a 'frequency' of edits to related pages?

The thinking here is that as it can take some time to setup or edit certain pages (like Index pagelists), if you have other 'fast' editors, there is a high potential for edit conflicts, which for long-term contributors can become a frustration. A warning about recent activity can potentially avoid these.

I am not sure how plausible it is to have a mechanism during the loading of the Edit page, which warns about potential edit conflicts, before you even start editing a page though.. Not sure if the edit requests are tracked in an accessible way that would make somthing like a "X is already editing this page..." the way Discord has a "X is typing..." warning in near real-time..

ShakespeareFan00 (talk) 23:05, 5 April 2021 (UTC)

@ShakespeareFan00: Actually quite fun little script to do: User:Inductiveload/ActivePageAlert. Add to your JS like this:

{
  mw.loader.load("//en.wikisource.org/w/index.php?title=User:Inductiveload/ActivePageAlert.js&action=raw&ctype=text/javascript");

  mw.hook( 'active_page_alert.config' ).add( function ( apa ) {
    apa.cfg.userLimits = [
      {
          user: 'SomeUser',
          timeLimit: 120
      }
    ];
  } );
}

Inductiveload—talk/contribs 12:50, 6 April 2021 (UTC)

Bit a help with Template needed

Latest comment: 3 years ago2 comments2 people in discussion

Can you help me out with {{The_works_of_John_Ruskin}}. The volumes are 01, 02, etc. Languageseeker (talk) 23:10, 6 April 2021 (UTC)

For {{The collected works of Henrik Ibsen}}, volume 1 is a pdf.

@Languageseeker:

Done × 2. Inductiveload—talk/contribs 23:14, 6 April 2021 (UTC)

Common Link to Small Scan Link

Latest comment: 3 years ago2 comments2 people in discussion

Is is possible to convert all {{Commons link}} to {{small scan link}}. This seems to be an ancient way to say that the file for proofreading exists on Common, but we now have Proofreader Page. Also, why do is there {{small scan link}} instead of {{scan link}}? Seems like extra typing for nothing. Languageseeker (talk) 00:46, 7 April 2021 (UTC)

@Languageseeker: {{Commons link}} link just means there is no Index page (yet), normally you skip that and go right to a scan link. Make an index page and you can change to a scan link. You can use {{ssl}}, or even User:inductiveload/save load actions if you want. Or AutoHotkey or similar. Inductiveload—talk/contribs 00:51, 7 April 2021 (UTC)

Problem with Small Scan Link and multi volumes that do not start with 1

Latest comment: 3 years ago5 comments2 people in discussion

I'm having a problem with {{small scan link}} where if I don't set the first volume to 1 it throws Lua error: At least an Index page is required. For multi volume series, such as the Complete Works of John Ruskin at Author:John_Ruskin, there are cases where the first volume is not actually 1. Languageseeker (talk) 00:50, 7 April 2021 (UTC)

{{ssl}} contains the instructions for that case: use the nameN parameters. Inductiveload—talk/contribs 00:52, 7 April 2021 (UTC)

What's the syntax for nameN? I don't see any examples on the template page. Languageseeker (talk) 01:02, 7 April 2021 (UTC)

@Languageseeker: (What have you tried?) Inductiveload—talk/contribs 01:05, 7 April 2021 (UTC)

I tried {{small scan link|vol3=The_works_of_John_Ruskin_(IA_worksofjohnruski03rusk).pdf}} Languageseeker (talk) 01:06, 7 April 2021 (UTC)

Better Footnotes

Latest comment: 3 years ago9 comments4 people in discussion

I noticed that when transcribing footnotes, there is currently no way to preserve the original reference. Instead, Wikisource creates a new numbering scheme for footnotes that has no basis in the original text. See, Page:The_New_Monthly_Magazine_-_Volume_101.djvu/74. Is there anyway to create a template to override these? Perhaps something like this

For example,

To Godstowe's glade;{{footnote|<ref>See ''Reginald Dalton.'' Book iii. chap, v.</ref> and hallows all the scene<br />|*}}

which would transclude to

To Godstowe's glade^* and hallows all the scene

Page 62, *: See Reginald Dalton. Book iii. chap, v.

Languageseeker (talk) 01:18, 7 April 2021 (UTC)

@Languageseeker: We don't try to reproduce the original footnote numbering exactly, see Help:Footnotes_and_endnotes. Inductiveload—talk/contribs 01:40, 7 April 2021 (UTC)

Is there any particular reason why not? It seems like a fairly big change to alter the way that notes are referenced. Languageseeker (talk) 01:48, 7 April 2021 (UTC)

I don't know exactly, but it pre-dates me. I'm sure you could dig up discussions on the Scriptorium. Inductiveload—talk/contribs 01:49, 7 April 2021 (UTC)

It seems like a fairly controversial topic that comes up fairly regularly. User:EncycloPetey and User:billinghurst are against it while other keeps on requesting it. Their main concern appears to be that it makes footnotes lose their distinctiveness when transcluded. However, can we not preserve them when transcluding by including the page number? With the community fairly divided on this, would it not be better to put this up for a vote? Languageseeker (talk) 02:06, 7 April 2021 (UTC)

Because we are converting from footnotes to endnotes, and works we do are typically footnotes that restart each page, many do not scale, and have seen endnotes with over a 100 count sources. PLUS we have a house-style and numbers would appear to be the consensus (and it pre-dates me too). Otherwise, I don't see that it is anything of particular concern a citation is a citation, and a house-style is a house-style. — billinghurst sDrewth 07:58, 7 April 2021 (UTC)

@Billinghurst: The issue is that it is hugely inaccurate and a fairly large modification of the source text. I can understand converting footnotes to endnotes, but renumbering footnotes makes them impossible to reference. If a book had Page 15, †, then it's incorrect to say that this is footnote 27. It seems that the consensus occurred many eons ago before Proofread Page existed and now we can easily format them correctly. Maybe it's time to revisit this if it's become technically possible and easy to do this. Languageseeker (talk) 12:47, 7 April 2021 (UTC)

If someone truly needs the number from the original source, they have access to the scan page by simply clicking on the page number in the margin. If the original source is using symbols, where the same symbol is used repeatedly for the first footnote on every page, then reproducing that will mean many, many footnotes all marked identically, which is useless in an electronic format. --EncycloPetey (talk) 13:06, 7 April 2021 (UTC)

(ec) The conversation doesn't particularly belong on a user talk page. But do tell me which of the 22 *, 13 †, 8 ‡ or the seven 22 1s, 13 2s, 8 3s you think should stay the same when they become endnotes? Tell me that you would like to see the split references remain split as they come from different pages. There is nothing wikt:inaccurate, let alone hugely inaccurate, so please stop the rhetorical flourishes; the refs are automatically generated and they accurately reflect the text and position of the reference that is in the work. Show me one _inaccuracy_ on a properly formatted and proofread page. — billinghurst sDrewth 13:08, 7 April 2021 (UTC)

IA Scan Link Template

Latest comment: 3 years ago4 comments2 people in discussion

It turns out that there's a {{Internet Archive small link}}. Could it be improved so that if the file exists on Common, it acts as a small scan link and if not, it redirects to IA?

For example,

would check if {{Internet Archive link|newvoyageroundwo01damp}} exists on Commons, if so and filetype is PDF or DJVU, it would act as {{ssl|A new voyage round the world. - Describing particularly, the isthmus of America, several coasts and islands in the West Indies, the Isles of Cape Verd, the passage by Terra del Fuego (IA newvoyageroundwo01damp).pdf}}

if not, then it would go to the current code

<span title="Copy of this work at the Internet Archive" style="font-size: 83%; white-space:nowrap;">([https://archive.org/details/{{{1}}} IA])</span>

This should help users avoid having to try to upload the file through the IA tool only to find out that the file already exists because someone already uploaded it which is a colossal waste of time. Languageseeker (talk) 13:03, 7 April 2021 (UTC)

@Languageseeker: AFAIK there is no way to get the filename of a PDF or Djvu with a given IA ID in the info via Lua, and that means you can't do what you want. At best you could write a gadget to flag up "stale" Commons links in JS (e.g. make them red or whatever) and suggest a replacement. There are only 60 transclusions of {{Commons link}} (it's not a very common template), so I think this is not a very exciting prospect. If you're going to go around tagging with {{Commons link}}, you might as well use {{ssl}} and set up the index while you are at it. Lua is limited to mw:Extension:Scribunto/Lua_reference_manual. Inductiveload—talk/contribs 13:22, 7 April 2021 (UTC)

Oh ok. So you can do File -> Get Metadata, but not Find Metadata -> get File? Languageseeker (talk) 13:39, 7 April 2021 (UTC)

Pretty much, yep. You can do it in JS (presumably aping however the IA-Upload tool detects matching IDs), but not, AFAIK, in Lua (remember that gets pre-rendered by the server on save - if it does a search as part of that, how will the server know when to re-render if, say, you upload a matching file?) Inductiveload—talk/contribs 14:06, 7 April 2021 (UTC)

"We're all mad here."

Latest comment: 3 years ago2 comments2 people in discussion

Ok, so one issue we will have to solve once and for all with ppoem is separator / elision lines.

Starting somewhere completely different, I've fallen into what seems to be an endless rabbit hole where we're using {{loop}}, {{

- - }}, {{…}}, {{separator}}, and probably a few more I've not had the misfortune to meet yet, plus manually spacing out asterii, dots, and middots with nonbreaking spaces.

So far I haven't really thought much about the solution; but I've concluded this is definitely a problem, and one that rises to "ppoem must deal well with" through the combination of this being difficult to solve inside ext:Poem and the relative high frequency of appearances of such lines in poems in general.

I'm currently mulling over ways to fix and merge (or replace) various subsets of the above templates for this purpose, but the though has also occurred to me that "it sure would be nice" to support it directly in ppoem. That way probably lies madness due to the number of knobs people can, and do, tweak with the existing templates, but I'm throwing it out there in case you see a potential for a clean solution where I can't. But however we approach it, I want to say the goal should be that once ppoem deploys generally there is One True Way™ to do such lines inside a ppoem, that is well documented, and all other ways should be avoided. --Xover (talk) 13:27, 7 April 2021 (UTC)

@Xover: I want to say flex box, but I'm unsure if that will export in any kind of sane way. Inductiveload—talk/contribs 14:07, 7 April 2021 (UTC)

'New' math....

Latest comment: 3 years ago3 comments2 people in discussion

There seems to be a bot that updates formulae using MATH tags to resolve some issues, See https://www.mediawiki.org/wiki/Extension:Math/Roadmap

Could something like the bot there be implemented here ? ShakespeareFan00 (talk) 18:34, 7 April 2021 (UTC)

@ShakespeareFan00: I'd certainly hope that they'll provide the fixes as needed with their bot. Is there any indication we have any issues manifesting at enWS? Inductiveload—talk/contribs 13:03, 10 April 2021 (UTC)

There may be a few affected pages. ShakespeareFan00 (talk) 13:12, 10 April 2021 (UTC)

The Complete Poems of Paul Laurence Dunbar

Latest comment: 3 years ago1 comment1 person in discussion

Thank you for your awesome work on The Complete Poems of Paul Laurence Dunbar. Languageseeker (talk) 00:32, 8 April 2021 (UTC)

Sword Blades and Poppy Seed in Author:Amy Lowell

Latest comment: 3 years ago3 comments2 people in discussion

Can you help clean this up so that it doesn't spam the PG category? Languageseeker (talk) 01:10, 8 April 2021 (UTC)

@Languageseeker:

Done Inductiveload—talk/contribs 01:47, 8 April 2021 (UTC)

Thanks Languageseeker (talk) 03:05, 8 April 2021 (UTC)

Metadata not loading

Latest comment: 3 years ago11 comments2 people in discussion

I'm having a strange problem affecting The Works of Charles Dickens where the metadata is not loading from Commons. It's basically all the red links on this page. When I tried to add Index:Works of Charles Dickens, ed. Lang - Volume 28.djvu and then reload the index data, it still didn't work. Languageseeker (talk) 00:48, 9 April 2021 (UTC)

@Languageseeker: The Commons page doesn't use Commons:Template:Book, that's why. The metadata filler takes the information from the Book template - if it's missing, there's not much it can practically do. Inductiveload—talk/contribs 00:53, 9 April 2021 (UTC)

Thanks. I fixed the metadata. Languageseeker (talk) 02:14, 9 April 2021 (UTC)

@Languageseeker: I don't think this really counts as "fixing" something. The metadata is now substantially worse. For a start, the title is wrong, there's no author and you've trashed the categorisation. That template wasn't used for no reason. The title of the work is The Works of Charles Dickens, the title of the volume is American Notes and Pictures from Italy. I think you need to be a bit more circumspect when storming into "fixing" "problems". In this case, filling in the Index page metadata manually would have likely been the easier way forward. Inductiveload—talk/contribs 02:28, 9 April 2021 (UTC)

Better? Languageseeker (talk) 04:55, 9 April 2021 (UTC)

@Languageseeker: Marginally, but still not as good as it was. Why did this even need changing at all? Just so you could do a one-off import with a helper gadget at Wikisource? Now that's done, what's the point of leaving the page worse off than before you changed it? Inductiveload—talk/contribs 06:49, 9 April 2021 (UTC)

I cleaned it up a bit more. My overall goal is to replace the needless non-standard template with a standard book template. It's not like the metadata was that great in the first place. Languageseeker (talk) 14:33, 9 April 2021 (UTC)

@Languageseeker: It was better than it is now, and it's not exactly hard to see where. The editor is still missing, you have mixed up the subtitle, the volume title and description, you have mixed up "location" and "city" and you have not replaced the category that the template added and removed the language tagging. Please either put it back how it was, or if you want to use the book template, make sure there is at least the same metadata there was before and in the right fields. Inductiveload—talk/contribs 16:22, 9 April 2021 (UTC)

Um, the incorrect publisher on Vol 28 was not actually the result of my edit. I looked over your template on Commons Template:Works of Charles Dickens volume and I noticed some issues. The major one is that the documentation for the Book template states that Volume should be a number while you have a text field (Volume={{{volume}}}: {{{title}}}). So, I believe that it might require fixing. I’m also not sure why you made City={{{location|}}} instead of City={{{city|}}}. Seems a bit confusing and this is causing an issue when changing it to the book template. In my edit, I put “With Introduction, Notes and General Essay by Andrew Lang” into the Description field so that the subtitle could be the name of the work. I don’t see a volume title in Book. I’m happy to revert if you fix your template to follow Commons guidelines, make sure that it can work on importing, and apply it to all the volumes. Otherwise, it seems to me that reverting the changes would just return us to a broken

"Um", you have done the import already, so that's specious. Also there is zero exception that the Commons info needs to conform itself to whatever rudimentary heuristic the Wikisource helper script uses. If the script doesn't work, just deal with it and enter it manually. It only needs doing once anyway.

I don't know why city → location, but it's not that confusing if you just check the page before you save it. If you want to use Book, whatever, I have no issues with that, but breaking stuff to put pressure on to fix things you don't like is not constructive. Just compare what it was and what it is. There is still missing data. If you care so much about the volume title not having a dedicated field, you should raise the issue at commons:Template talk:Book (and get involved in fixing the issue, not just drive by with "um, this is not how I like it kthxbai") and not just abuse the description field (which is for "out of band" information like physical condition or whatever) and move on, leaving messed up stuff in your wake that someone else has to deal with. Thousands of books have a volume title there. If you want to go on a mission to add a volume title field and tidy them up, into more a more semantic field, fine, I'd even say that's a good idea. But until then, leave things strictly better than you found them, and I think we can all agree that how you intended to leave it is not better than how it was. Inductiveload—talk/contribs 12:58, 10 April 2021 (UTC)

Let’s bury the hatchet on this one. I never meant to make things worse. I might have been a bit hasty and didn’t realize that was a custom template and not just a user error. I tried to fix this it, but it didn’t seem to work out. I respect you and what you do for this site too much to want to create bad blood or hurt feelings% Languageseeker (talk) 05:36, 12 April 2021 (UTC)

(or socially-distanced equivalent). I have raised a query on commons:Template talk:Book about adding a volume title, because you are right in that forcing the volume number and title into one field is not ideal.

I sorry if I came over as sharp. I have no intrinsic desire to keep a work-specific template over Book, BTW, I just would like to make sure that there is strictly no less metadata. Ideally, of course, all the bibliographic info (as opposed to file specific) should be pulled from some structured data "somewhere" (Wikidata/Commons SDC/something else?) but I have no idea what that should be, and I don't have enough clones of myself to really dig into it right now, let alone embark on some mission to improve all the literal millions of books at Commons. Inductiveload—talk/contribs 07:26, 12 April 2021 (UTC)

Help with Splitting Images and Creating DJVU

Latest comment: 3 years ago5 comments2 people in discussion

Can you help split the images in Paradise Lost 1674 and create a DJVU out of them? Languageseeker (talk) 02:14, 9 April 2021 (UTC)

If you could use ScanTailor or similar to prepare a set of split images, I'll run it through the DJVU/OCRifier. Inductiveload—talk/contribs 02:36, 9 April 2021 (UTC)

I'm not very good with batch jobs. This guy uploaded 32 rare books without splitting the pages and it would be great to do them all [2], but I lack the technical skill. Languageseeker (talk) 05:25, 9 April 2021 (UTC)

@Languageseeker: It's not a batch job as such, ScanTailor is for processing scan images (e.g. splitting, etc). Give it a try before saying you can't do it.

I'll make 32 DjVus from 32 zips of images because I know that's quite hard and until I can get round to publishing the code and/or making a web-based tool others may not be able to do it easily. But doing 32 end-to-end extract/splits and making the DjVus and doing all the metadata for upload "on spec" is more than I have time for, sorry. Inductiveload—talk/contribs 07:03, 9 April 2021 (UTC)

No worries at all. I appreciate you looking into it. Languageseeker (talk) 20:25, 9 April 2021 (UTC)

author template/module picked up a comma section <-> contributor ?

Latest comment: 3 years ago2 comments2 people in discussion

Maybe I am losing it, though I don't remember that we had a comma between the section name when we used contributor as shows at McClure's Magazine/Volume 9/Number 1/May. Would you be so kind to recheck, and if we didn't can we please head that way. Thanks. — billinghurst sDrewth 11:29, 10 April 2021 (UTC)

@Billinghurst:

Done. Inductiveload—talk/contribs 12:23, 10 April 2021 (UTC)

A Little Help with a template

Latest comment: 3 years ago3 comments2 people in discussion

I'm trying to create {{IAu}} to simplify uploading to Commons from IA. It basically takes the three parameter that IA upload tool needs and constructs a url from them: {{IAu|Internet Archive ID|Common File Name|pdf or djvu}}. However, I'm running into an issue where

{{IAu|jesuitrelations169jesugoog|59|pdf}} works, but

{{IAu|cu31924092218191|The Jesuit relations and allied documents (Volume 27)|pdf}} doesn't.

Anyway you can take a look? Languageseeker (talk) 15:42, 12 April 2021 (UTC)

@Languageseeker: The problem is Parameter 2 has a space in it. This means the link is:

[https://ia-upload.toolforge.org/commons/fill?iaId=cu31924092218191&commonsName=The Jesuit relations and allied documents (Volume 27)&format=pdf Upload ...]

, which turns into a link with the URL "https://ia-upload.toolforge.org/commons/fill?iaId=cu31924092218191&commonsName=The" and the text "Jesuit relations and allied documents (Volume 27)&format=pdf Upload ...".

The solution is URL-encoding the URL parameters: like this. Inductiveload—talk/contribs 15:56, 12 April 2021 (UTC)

Thanks! Languageseeker (talk) 02:29, 13 April 2021 (UTC)

with new per index css

Latest comment: 3 years ago4 comments2 people in discussion

I am guessing that with the new css setup, that we can stop doing per row align right like in Page:A catalogue of notable Middle Templars, with brief biographical notices.djvu/293 and do something with td:nth-child or per column. If we can, could you create something in Index:A catalogue of notable Middle Templars, with brief biographical notices.djvu/styles.css and I can take it from there. Guessing that we are creating classes. I know that we looked at it many many years ago however our css was too dated at that time. It will definitely make for simpler coding at a work level. Thanks if you can help. — billinghurst sDrewth 07:31, 14 April 2021 (UTC)

@Billinghurst: There you go! Mostly just a regex replace for things like {{ts|ar}} and making the letter headings into headers (using !). Inductiveload—talk/contribs 07:43, 14 April 2021 (UTC)

If we can set the last or the right-most column what are your thoughts on a global class of align last column right? It is common enough to make global, and people set all the other formatting as needed around it through a work's css. — billinghurst sDrewth 11:11, 14 April 2021 (UTC)

So global table classes is something I have thought about for a very long time. {{table class}} (not my work) attempts to deal with some of the common cases. The issue is a multi-way balancing act between simplicity of expression, simplicity of markup, simplicity of implementation/maintenance and robustness.

I have some concerns with how {{tc}} is implemented, e.g. because classes like _bt are basically just an indirect way to say {{ts|bt}}. CSS shines where we can leverage selectors like descendants and :nth-child. For example __grid is something that cannot be done without very verbose inline styling.

There's also a risk of de-semanticising (ironic use of a non-word intended!) things as well as providing a wide surface for fragility. For example, the following would render the same:

{| class="somework_somedata"
|... index CSS provides the grid and margin:auto rules here, which apply to the "_somedata" type of tables.
|}

{{table class/import}}
{| class="__grid __floatc"
....
|}

However, the latter is a stylistic intent (basically used as a shorthand for hundreds of {{ts}} calls), while the former is a semantic statement of the form "this table is a somedata table" and the styling is a natural consequence of that, with the class identity→styling mapping performed by the index CSS on a work-local basis. Which is perhaps a slightly sophistic argument, and out-of-touch with how we currently do things (mostly out of technical necessity) but I think it's certainly one to bear in mind before we storm ahead and scatter-gun thousands of quasi-global classes into tables all over the place.

To take the Templars index as a more concrete example, the question is do we prefer having a complete semantic→styling mapping in the index CSS (as we do now, more or less) or write something like class="last-col-ar heading-larger-centered margin-auto" and compose the styling out of many pre-made blocks (and still if anything can't be an off-the-shelf class we'll need to use index CSS to fill in the gaps).

Inductiveload—talk/contribs 16:48, 14 April 2021 (UTC)

"The World and it's People"

Latest comment: 3 years ago2 comments2 people in discussion

https://archive.org/details/books?query=%22World+and+its+People%22&sin=&and%5B%5D=mediatype%3A%22texts%22&sort=date

You uploaded Vol 12 recently, so I created {{World and its People}} In case other volumes got uploaded by other contributors.

I note about 12 volumes in the series? ShakespeareFan00 (talk) 10:16, 14 April 2021 (UTC)

@ShakespeareFan00: Lol, that was a random upload to test IA-Upload could duplicate a PDF! Inductiveload—talk/contribs 16:51, 14 April 2021 (UTC)

Auto advancement. Of POTM

Latest comment: 3 years ago3 comments2 people in discussion

Just wondering how long it normally takes the POTM to advance once the current one is fully validated. Languageseeker (talk) 15:57, 15 April 2021 (UTC)

Until someone changes the current value to something greater than 1. And before you ask, I don't really think this should be automated, because even if the index is validated, it may not be transcluded properly. With great care you could make an attempt at using the index page info like {{index transcluded}}, but since I'm sure there'll be ways for that to fall over in hilariously unforeseen ways and go unnoticed, it's probably easier to just change current manually. Inductiveload—talk/contribs 17:18, 15 April 2021 (UTC)

You know me too well. Seems my mind was confused. Hope good progress will be made on one of the greatest poetic works of the Harlem Renaissance. Languageseeker (talk) 19:28, 15 April 2021 (UTC)

Help Scrapping Books from BL

Latest comment: 3 years ago3 comments2 people in discussion

I was wondering if you knew of any easy way to scrape all 127 quartros off the BL. The link is [3]. Then it goes to https://www.bl.uk/Treasures/SiqDiscovery/UI/PageMax.aspx?strResize=yes&strCopy={{id}}&page=1 Languageseeker (talk) 03:53, 17 April 2021 (UTC)

You can scrape the images via the URL like https://www.bl.uk/TreasuresImages/shakespeare/mid/ham-1603-22275x-bli-c01/ham-1603-22275x-bli-c01-009.jpg. Only the last number needs to be changed. With a bit of experimentation, you may be able to figure out the maximum page count, or just iterate until you hit a 404. For determining the ham-1603-22275x-bli-c01 "slug" for each work, you can load the PageMax.aspx page and pick out out the #uiPageImage element.

As before, ScanTailor is the tool of choice for splitting and de-skewing the double-page photos. Inductiveload—talk/contribs 11:16, 19 April 2021 (UTC)

Thanks for the advice, but that sounds a bit too technical for me. I don't know what slug is. I tried using scan tailor, but it produced a mess. I tried to automatically select the images and set uniform margins and produced a horribly degraded result. Languageseeker (talk) 20:14, 19 April 2021 (UTC)

Review Activity of Billinghurst

Latest comment: 3 years ago2 comments1 person in discussion

Can you please review the activity of Billinghurst on Hamlet (Shakespeare). Not only has he removed valuable external scan links, but he's also removed existing scan links. Languageseeker (talk) 03:00, 18 April 2021 (UTC)

Also, this comment he left on my talk page. User_talk:Languageseeker#You_are_all_over_the_place._Finish_your_works. Such conduct would be considered abusive when coming from a user let alone an administrator. Languageseeker (talk) 04:38, 18 April 2021 (UTC)

Beware the Jabberwookie!

Latest comment: 3 years ago11 comments2 people in discussion

The first line of the first stanza.

This line belongs to no stanza.
Neither does this.

But this gets a shiny new stanza.
So long as we don't tickle the Jabberwookie.
It is much worse than the Jabberwock,
since it will wrap you up in peas.

I don't think, offhand, that this is fixable. So workarounds that come to mind are either magic syntax or a fake hr specifically for ppoem that uses markup that will not anger the Jabberwookie. Having also looked (very slightly) at {{…}} and {{

- - }} in ppoem lately, I'm inclined to think we may need a small suite of the most common stuff specifically for use in ppoem, as an alternative to magic syntax.

PS. It also just occurred to me, that if we're to look at the extension route, the natural place to start is going to be SyntaxHighlight rather than poem. --Xover (talk) 06:45, 19 April 2021 (UTC)

@Xover: making the stanza a div helped (a bit), but HR inside SPAN is not quite right. Perhaps we should have a magic syntax to drop a line right out of a stanza and start a new stanza after it? Inductiveload—talk/contribs 09:43, 19 April 2021 (UTC)

When the rule is being used as a separator, sure; but here it conceptually is a line. I'm thinking what we need is a line span that just happens to display something that is visually indistinguishable from an actual horizontal rule.

The first line of the first stanza.
—————
This is line three of the stanza.
“Hi there, I'm Stanza 1.4.”

Along those lines. All these types of rules might also need a smaller line-height to look balanced with the lines containing full-height characters. --Xover (talk) 10:03, 19 April 2021 (UTC)

@Xover: how about just {{bar}} then:

The first line of the first stanza.
{{bar|4}}:
————
Keep going...

Inductiveload—talk/contribs 11:04, 19 April 2021 (UTC)

Yeah, that's what I'm doing but bar has… other issues.

Random thought: a special kind of line, a "separator line", that has a smaller line-height, but is most likely going to be populated by a specific template (like bar) rather than by magic syntax? Not at all sure it's worth the effort and complexity, but… --Xover (talk) 13:17, 19 April 2021 (UTC)

@Xover: hmm, fundamentally such a thing can be done with the line class, so it could be {sep} ———.... BTW, which page is "type specimen" for this? Inductiveload—talk/contribs 13:20, 19 April 2021 (UTC)

this. And class would presumably do the job just fine. --Xover (talk) 13:49, 19 April 2021 (UTC)

And speaking of, what's your thought on the canonical way to tweak or disable the hanging indent? 4em is a bit aggressive for this use, and some bits should have none. --Xover (talk) 13:52, 19 April 2021 (UTC)

I guess we could do it as a class on the whole ppoem (and/or per stanzas. Since it's an effect on the line, the selector .ws-poem-no_indent .ws-poem-line would work in either case. The DI special-casing would need handling as well.

Sometimes I kinda envy the typesetters of yore who could just put a letter in the right place in the chase and pack it out with reglets. Inductiveload—talk/contribs 14:03, 19 April 2021 (UTC)

Oh, I figured you had an idea of how that should be done. I'll try futzing with pre-defined classes and see how that feels. No indent is probably a shoe-in, but differently hanging indent quickly runs into the {{ts}} trap. {{hin}}, if it doesn't interact badly, may be it; otherwise the threat of magic syntax starts looming.

Yeah, traditional typesetting and layout have some definite advantages. And it's not helped particularly by us sitting squarely in the "sweet" spot where we get all the indeterminism and quirks without very much of the dynamism and flexibility. But ppoem raises the bar there, so there's certainly hope for the future. --Xover (talk) 15:04, 19 April 2021 (UTC)

@Xover: A built-in for no indent sounds like a good idea, and maybe 2em might make sense. After that, it probably makes sense for users wanting unconventional indents to supply their own classes, or Template:Ppoem/styles.css will look like a kitchen sink warehouse on stock-take day. Inductiveload—talk/contribs 15:25, 19 April 2021 (UTC)

broken index page <= config

Latest comment: 3 years ago7 comments4 people in discussion

Index:The character and extent of air pollution in Leeds - (A lecture delivered before the Leeds Philosophical Society, on March 3rd, 1896.) By Julius B. Cohen (IA b21534160).pdf has a red link on the status, and you have a double category naming. — billinghurst sDrewth 12:48, 19 April 2021 (UTC)

@Billinghurst: it looks OK to me (https://i.ibb.co/xXxgcpB/2021-04-19-141431-1022x570-screenshot.png), even with a purge? Maybe it was a transient thing while the template and module switched over control of the status? Inductiveload—talk/contribs 13:15, 19 April 2021 (UTC)

No obvious breakage here either, for whatever that's worth. --Xover (talk) 13:18, 19 April 2021 (UTC)

But Index:The Cruise.pdf shows up with redlinked status and a borked category. --Xover (talk) 06:55, 20 April 2021 (UTC)

@Xover: looks OK to me? It could have been borked for a few minutes yesterday - cached? Inductiveload—talk/contribs 07:21, 20 April 2021 (UTC)

It was edited by an IP in the interrim (very strange coincidence), which would have reparsed it. --Xover (talk) 08:09, 20 April 2021 (UTC)

IP was SFan not logged in. Beeswaxcandle (talk) 17:19, 20 April 2021 (UTC)

Transclusion status bot run

Latest comment: 3 years ago6 comments3 people in discussion

I'm seeing instances where there is a transclusion template with "yes" and the template is replaced with status marked, but other situations where there is a template that is not removed and the information is not transferred.

In particular, situations where there is a duplicate half-title page (or something similar) that is deliberately tagged and categorized as "not transcluded". Does the bot make a strict check that does not allow for these situations? --EncycloPetey (talk) 03:28, 21 April 2021 (UTC)

@EncycloPetey: Pages should not be affected - this is index templates only. The bot makes no decisions, just copies the status from {{index validated date}} or {{index transcluded}} to the Index page field. Can you link a page you think it has not removed a template? Inductiveload—talk/contribs 03:48, 21 April 2021 (UTC)

Oh, I take it you mean Index:Shakespeare's Sonnets (1923) Yale.djvu? ~~I didn't think there were any using "X" and also the templates, since that was a short lived "temporary" state. I'll run though and check that (small) batch.~~ Duh, sorry, I've thought of that - any "duplicate" templates will be hoovered up as the two old templates get converted. For a short period, a very small number of pages (~10 I estimate, out of ~13k) may have more than one category. Inductiveload—talk/contribs 03:50, 21 April 2021 (UTC)

That killed my Watchlist. #Killed not killed — billinghurst sDrewth 12:54, 21 April 2021 (UTC)

@Billinghurst: #sorrynotsorry, at least it has a bot flag :-D You want to do the honours and kill off the templates? Inductiveload—talk/contribs 16:50, 21 April 2021 (UTC)

All respective pages have text updated to reflect dropdown field in Index: page. Templates removed. I will let you tell the community of your great progress. — billinghurst sDrewth 05:40, 22 April 2021 (UTC)

Could you review a proposal?

Latest comment: 3 years ago4 comments2 people in discussion

I’ve been working on a proposal to improve the tooltip system used on wiki source. My draft proposal is at user:Languageseeker/popup. I was wondering if you could review the feasibility/desirability/clarity of my proposal. Languageseeker (talk) 18:10, 21 April 2021 (UTC)

@Languageseeker: I understand where you are coming from. I don't really have the bandwidth to deal with this in detail, but your proposal probably needs to focus a bit more on how you plan to implement this rather than just what the problem is. Immediate technical queries I have with it as written:

"The new system would use css popups": What is a CSS popup? CSS has no innate concept of a popup, since that's structural data, not stylistic. Probably a fully-fledged system would use some kind of rich UI element like OOUI when possible (or maybe it can just be done with mw:Reference_Tooltips right now).
E-readers: not many solutions will work on e-readers since the baseline is a very (very) basic environment and you may not even have a touchscreen, let alone a mouse. Probably the best you can realistically hope for is on-the-fly conversion to footnotes by the exporter. Footnotes generally are supported OK (and use the right epub hinting). Inductiveload—talk/contribs 20:28, 21 April 2021 (UTC)

I was thinking about using something like these two examples [4] [5]. So, it would use CSS to control the display of the popup and how the text is stylized. In this way, these options can be overwritten on a per-text basis. As for ereaders, it would require additional engineering to convert these popups which I don't plan on doing. Thoughts? Languageseeker (talk) 20:51, 21 April 2021 (UTC)

That's a very very rudimentary popup system. MediaWiki includes OOUI, so probably something along the lines of mw:Reference_Tooltips is more practical than a full-custom solution? Inductiveload—talk/contribs 21:04, 21 April 2021 (UTC)

talk archives

Latest comment: 3 years ago2 comments2 people in discussion

To note that your talk archives are somewhat hidden. I have modified {{archives}} to have an automated year function which should assist you if interested. — billinghurst sDrewth 05:26, 22 April 2021 (UTC)

Whoops, quite right

Done. I wasn't sure how to integrate "Archive 1/2" with auto year, so I gave up and did the easy thing. Inductiveload—talk/contribs 13:25, 22 April 2021 (UTC)

upgrades to the index form

Latest comment: 3 years ago6 comments2 people in discussion

The form for the index page prefills in information from the book template. When I move the information to wikidata, the form shows up empty.

So, a nice upgrade is to have it pull from wikidata if it exists.

Everything I know about pulling information from wikidata, I learned here, at the author template. It is so easy.

If I had a computer, I would make a citation tool that would make the publication listings at the author pages.

On a different matter, I have a question about a template at commons. The template gave me errors, but it seemed to work. I was wondering if you could look at it and tell me if it is rendering nicely for you as well. It is in use for volumes of things there commons:Category:Bentley's Miscellany, Vol. 1 for one.

Thanks.--RaboKarbakian (talk) 19:11, 22 April 2021 (UTC)

It can be done, but it'll be a little bit of work to get it done. There aren't many files like that currently, so for now it's not a high priority to me. I'll get to it one day, unless someone else does first.

I don't know what template issue you are referring to, but the page files there seem to at least not have errors. E.g. File:Bentley's Miscellany, Vol. 1-0279.jpg looks OK. Inductiveload—talk/contribs 19:46, 22 April 2021 (UTC)

I was wondering about the volume navigation.

I was very unhappy about an exchange I had at the commons. You should read it carefully and have your own opinion or not of it commons:User_talk:Languageseeker#pdf,_djvu_and_jp2. It wasn't just my words there. I know how I would fix the bot, but I don't like software that does this.--RaboKarbakian (talk) 19:58, 22 April 2021 (UTC)

The navbar template appears functional? I'm not seeing any errors.

I don't know what about that conversion could make you "very unhappy". Regardless, the IA-Upload bot has been modified to only warn if the same identifier has been uploaded before (phab:T269518). I don't like software that does this - I don't know what you mean by that: If you don't like IA-Upload, don't use it? Inductiveload—talk/contribs 20:27, 22 April 2021 (UTC)

In the exchange on that talk page, I read your welcome to me when you were "ready to export" texts here and something I said after that during the DP f2 days. These words were delivered back to me and way out of context in an automated sort of way. I don't like that kind of software. Liza is so late 90s....

I like IAUpload, enough to have learned some of its ways and foibles and I am both sad and happy it is being fixed. I kind of want to cuss or append everything that I say on a talk page with something about not liking chat mining now.--RaboKarbakian (talk) 21:40, 22 April 2021 (UTC)

I'm sorry, I genuinely have no idea what you are talking about, as far as I can tell LanguageSeeker was being helpful and I'm fairly sure they're not a chat bot. Inductiveload—talk/contribs 22:02, 22 April 2021 (UTC)

Monthly Challenge Work

Latest comment: 3 years ago1 comment1 person in discussion

I would really like to get the Monthly Challenge running in May. Since, I'm not sure that the Bookworm Bot can get started by then, I've created an alternative page for the Project Wikisource:Community collaboration/Monthly Challenge (2021)/May 2021. There's a nomination project on WS:S#Call for Nomination of Texts that has enough texts already. There are two major tasks remaining. The first involves halving the New Text Box and creating a text box for the Monthly Challenge above it similar to the one on French Wikisource. This requires an Administrator. Can you do that? The second is writing the FAQ which I plan to get done over the next few days. I'll also create a discussion on the FAQ at WS:S once they're done. Languageseeker (talk) 13:18, 24 April 2021 (UTC)

Wanna take it for a spin?

Latest comment: 3 years ago7 comments2 people in discussion

Whenever you have a moment, could you take this for a spin to check that I haven't massively borked something before we replace MediaWiki:Gadget-ocr.js? Other thoughts and feedback welcome too, of course, should you feel inclined. Xover (talk) 15:11, 23 April 2021 (UTC)

@Xover: not flushed with time right now, but I'll try. First impression: not sure it's working for hOCR - I get the "hOCR complete" popup but nothing happens in the text box. I think $('#wpTextbox1').value should be $('#wpTextbox1').val() (or cut out jQuery and do document.getElementById("wpTextbox1").value = ...).

Also probably need to ~~steal from~~ be inspired by the Google OCR JS and add $( "a[rel='wsOcr1']" ).css("width", "45px"); around line 95 to fit the wider button icon. Unless the Wikieditor config hook has a CSS field I don't know about. Inductiveload—talk/contribs 15:54, 23 April 2021 (UTC)

Fixed. --Xover (talk) 10:02, 24 April 2021 (UTC)

Editor

@Xover: As an aside since this is adding buttons, with the "2017" editor now 4 years overdue, do you have any idea what's going on there? Is there anything we should be doing to get ready for an eventual change-over? It doesn't seem to share the configuration hook with the 2010 editor. As usual, there's naff-all in the way of documentation for the new shiny. I don't personally use it and found it incredibly annoying when I tried it, but there's a few tagged edits around, so some people must like it (or they don't know they can turn it off!). Inductiveload—talk/contribs 16:14, 23 April 2021 (UTC)

You mean the wikitext mode of the Visual Editor, I presume? There's actually been some movement on that just recently, triggered by CommTech's use of Parsoid for ws-export, in that the VE team has apparently now given this enough thought that they've concluded they need a Parsoid-native version of ProofreadPage to make this work. I think that opens the possibility that they'll give it some attention eventually. But the bad news is that I don't think our use case is even on their radar, so all the docs and exposed functionality assume you're a core MW dev employed by the WMF. It is definitely possible to hook into VE, both visual mode and wikitext mode, in all sorts of ways, but I've found jack-all that's usable for Gadget or user script developers. Which reminds me… no, on second thought, I'll have to dig up some links for that rant. Later. --Xover (talk) 16:41, 23 April 2021 (UTC)

PRP and Parsoid came up in phab:T274654#6946964 and resulted in phab:T278481.

The other links I wanted to dump were mw:ResourceLoader/ES6 and T237688, T178356, and T75714. The long and short of which is… MW and the WMF are now feeling the pain of not using ES6 so much that they're willing to drop Grade A support for IE11 and start requiring it for new core features, but have not as yet given any priority or allocated resources to developing the necessary validator for end-user accessible code (like gadgets) to be able to use ES6. This is kinda concerning since Krinkle, who as they mention in one of the comments is the most likely person to do the work, jokingly estimates it won't get done until 2030. I'm not sure how to approach that because, as they say, it isn't a trivial job, and it's hard to point at anything ES3 makes actually "impossible" or that ES6 suddenly makes possible; it's more the same kind of pain that leads the WMF itself to want to move the bar.

And speaking of the utter neglect of Gadget coders… Are you familiar with Vue at all? I've never really looked at it but from a quick peak it looks to exist for / appeal to 1) people who are caught up in the now-fashionable "let's all dump on jQuery" fad, 2) people who are religiously convinced React is the light and the truth, solves all problems, brings world peace, and is therefore the one true way, and 3) people with a genuine need to build full blown applications. What I am not seeing is a library of pre-made UI widgets ala jQuery UI—for which OOUI was already a hideously complex replacement—or anything else that would help us make robust, functional, consistent gadgets with modern and user friendly UI. Sigh. --Xover (talk) 11:42, 24 April 2021 (UTC)

Oh good, we're on the same page. The total neglect of Gadget coders is really quite frustrating. There are AFAIK, a grand total of zero documents explaining best practices for gadgets (specifically: configuration, deployment and code sharing), and you get short shrift in chat channels for asking. The OOUI help pages are deserts for answers (and still have the wrong IRC link on them, where you get an earful for being in the wrong channel). The OOUI "manual" is hilariously short on use cases. and I'm still short of a tab-completing selection box that actually reduces keystrokes for finding something (for User:Inductiveload/quick access.js, I had to roll my own).

I did ask for a way to at least ask ResourceLoader to load specific deps locally (phab:T278304) because otherwise you can't really test "JS libraries" without finding and disabling all other clients of the library. If the core site JS uses such a library, that's not very practical. "Test it on a local wiki" was the answer given. I'm working on a workaround but it's fugly (think web server which regexes JS on the fly ugly).

RE Vue, I have no idea WTH is happening, or if there's anything useable for Gadgeteers, if there ever will be, or even if that's planned. I am vaguely planning to use Vue for a future Toolforge thing just so I'm not totally blindsided when it comes along (but Vue + Bootstrap, since I have no earthly clue where WVUI is at, or even if one can use it for anything right now). I think the idea for Vue is that it's good an "incremental" use, so you can drop in a few Vue widgets without having to drink the whole cup of SPA koolaid. As you say, all one needs is a small library of widgets to play with. And at least it can't be much worse that OOUI in terms of verbose boilerplate needed, eh?

Anyway, if I get something working before you do, I'll let you know. Inductiveload—talk/contribs 22:23, 26 April 2021 (UTC)

Wikidata requires paranoia

Latest comment: 2 years ago3 comments2 people in discussion

cf. Hans Andersen's Fairy Tales/The Top and Ball and probably a smattering of other pages, and this change. Code like this needs to be not only defensive but downright paranoid: you're dereferencing a complex datastructure that is directly end-user editable with zero input validation. Any and all levels in that datastructure may be missing or contain garbage. As I recall, the last time I ran into this issue I had to walk the tree level by level and do a nil check for every one. Xover (talk) 13:01, 24 April 2021 (UTC)

@Xover: I think (though I could well be wrong) that the datastructure isn't totally user-editable. In this case, a missing datavalue field (and a snaktype=somevalue field) means it's "unknown value"). Checking for the item datatype (wikibase-id, though if the calling code said "edition of", that's a given) and a present datavalue should be enough? (bedtime reading: https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html) Inductiveload—talk/contribs 22:52, 26 April 2021 (UTC)

nil-checking datavalue is probably enough here, yeah (cf. below).

You're right that the whole structure isn't user-editable, but you can easily run into logical inconsistencies like a qualifier value for a property with unknown value. My main point is that in dealing with Wikidata we need to armorplate and program defensively as a rule, much more than with any other semi-structured data source. Pulling from MWs DB tables, for example, we can assume a certain level of consistency because it's enforced on input by the software in a high-level UI. Wikidata barely validates syntax, much less any kind of real consistency, and at the same time let end users change data in what is essentially the "database" layer. It's kinda scary. (remind me to rant about semantics and information modelling some day; some day you have lots and lots of free time…)

In any case, I dug up the previous instance I was thinking of: Armoring against bad data on Wikidata. It's been a while, but superficially it looks like a very similar type of issue, which probably means nil-checking the datavalue is the pattern to extract from this.

Oh, and that doc was useful. I've been looking for that kind of thing and have failed to find it. I think we need to start thinking about what we can do in terms of gadgets to make our WD integration better and more user friendly, without crossloading code from some dude's user space on a different language project… --Xover (talk) 08:58, 27 April 2021 (UTC)

fr:Module:Index_template

Latest comment: 2 years ago6 comments3 people in discussion

Guessing, not presuming, that you have had a look at what frWS is now doing with their Index: page template. Either way, waving it under your eyes. — billinghurst sDrewth 08:30, 26 April 2021 (UTC)

@Billinghurst: I have coveted their WD barcodes and I will eventually acquire them for our own nefarious purposes. Inductiveload—talk/contribs 22:07, 26 April 2021 (UTC)

We could probably persuade Tpt to give us a rundown of what the module does, how it fits into their larger architecture, and what the drivers were. I'm thinking the coding part of this stuff isn't so hard, it's more an issue of figuring out how it should work, what are the externally imposed limitations, etc. But then, the way WD is set up is fundamentally incompatible with the way my brain is wired, so maybe that's just me. :) --Xover (talk) 09:02, 27 April 2021 (UTC)

@Xover: I can probably figure most of it out, it's mostly a question of "what we actually want it to do" (other than bling-bling which is of course a noble aspiration). The biggest issue for me is that WD is actually not always as good as you would think/hope at the kind of bibliographic metadata we actually would want on an index page - particularly for things like volume (and that that's before we get to periodicals).

Furthermore, even if I could figure out representing sub-edition data (like volumes), there's another level to it: Index pages exist in a kind of unhappy no-man's-land between WD edition items and Commons SDC data - while the file represents the edition, the instantiation of that representation has its own properties (e.g. pagination, missing pages, scan quality, scan providence, etc etc) that can and do vary even within the same edition. This is, I think, the "Item" level of FRBR (Group 1). This is, I think, one of the primary disconnects between what WD promises it could be and what it actually is to Wikisource comes about. Inductiveload—talk/contribs 09:19, 27 April 2021 (UTC)

Now I could be wrong, however, it still relies on data population at WD, and prior to the work being done here. We have struggled to get good WD compliance here at an work/article's creation, and I know that I generally do it just the once when I have transcluded. Prime issue for me is the push from Commons to WD has no easy semi-automated tool to either push or check data. Similarly the push from our Index pages to WD is not connected. — billinghurst sDrewth 11:37, 27 April 2021 (UTC)

Well, certainly without the data being present somewhere, it's all a disaster. However, we currently have no one place we can put all the data, or even a solid idea of how to split data between Commons, WD and WS index pages in the general case. Which has lead to "some people" (i.e. me) saying "sod it" and just not bothering too much. I try to get the file metadata for scans to be reasonable, but since I don't even know what best practices are, I leave it at that for files.

For "easy" works like a single volume novel, WD can hold pretty much all of it in the work/edition structure, and only things like the pagelist need to be done at the file (or Index) level. For things in the mainspace, it's also easier, as that that generally can traverse a edition or translation of (P629) statement and suck data out of the work-level item (where things like "topic" likely reside). The "item" level of the FRBR system kind of falls away at that point, because we kind of isolate that behind the Index/Page:Mainspace division.

It's possible there's a good way to do this for a general book (e.g. a volume of a set, or a periodical issue), but I haven't been able to work it out yet. And sadly, the periodical thing especially is probably where WD can most help us deal with the enormous amount of bibliographic data represented by the contents of a periodical.

I am working on a WS -> WD item creator (working title: Wikidata Creata :-D), but it's not done (or half done) yet because writing gadgets is such a huge PITA with the tools we are given. So I'm considering doing it all in a separate web-app on Toolforge. Inductiveload—talk/contribs 11:52, 27 April 2021 (UTC)

the setting up of a journal

Latest comment: 2 years ago4 comments3 people in discussion

I checked the tables of contents. Ingoldsby goes to VOLUME 17! Honestly, I was just twiddling thumbs with Rackham. Bentley's is more like interrupting thumb twiddling by playing with a hangnail. I don't like my computer being hacked. It is (a simile is about to happen, different from string literal) like having people jump in your car and go with you, expressing opinions, making rules and interrupting -- just with their presence. When my computer didn't boot, (and when my home directory was being mounted via nfs or ntfs, they mounted it "Non Executable") it was like (simile) they decided to park my car in their garage and I don't even get to know where or who.

I am trying not to spin this one way or another. I am trying this for years! Closer to 5 years since the non-executable fiasco than 1. So, I was going to use my compromised computer and work on something that I like but don't have great cares for. Arthur Rackham. Very wonderful illustrator; his creepy is cute.

Apologies for the rant. My inkscape friend does perl. I do python. WS is lua?

I was going to start to set up Bentley's Miscellany here when I saw Wikisource:WikiProject Popular Science Monthly. I came here and ranted. They have an "engine". At data, I have separated scans & index pages from Main.

RL is calling, there is no wrong answer: journal? Yes or no.--RaboKarbakian (talk) 15:06, 27 April 2021 (UTC)

I'm sorry, I do not understand what you are saying. The first half makes no sense to me. The second half seems like you'd like to set up a periodical at WS for Bentley's?

The first step towards that is probably gathering a list of all the volumes and scans into Bentley's Miscellany. If you're going to use the Internet Archive SIM scans (which look like they are split into 3 each, TOC, content and index), that might make life a bit harder for you. I don't really have a good suggestion for a better option other than scraping the Hathi scans or checking over the Google scans at the IA (which look pretty poor). Inductiveload—talk/contribs 15:31, 27 April 2021 (UTC)

Jumping into this conversation. I also had plans to create an page for Bentley's miscellany. In addition to the Princeton Scans, Toronto also has full color scans of some of the volumes. For the volumes on Haithi trust, is there anyway that you can batch download the Princeton Set and the Upload them to Commons without compression so that the images can easily be extracted? Languageseeker (talk) 18:31, 27 April 2021 (UTC)

The first rant here is about why I choose different things than what I really want to work on. If my words are being mined, then always including a displeasure about being hacked works for me.

I played with pulling in information from wikidata for about 20 mins, it has been a couple of years since I did this.... You can see what I got at Bentley's Miscellany. The upper portion is paste. The data calls (also edited paste) are under "pulled".

The scans are at commons commons:Category:Bentley's Miscellany. Best to change your preferences to "cats on top" there for easier navigation.--RaboKarbakian (talk) 19:13, 27 April 2021 (UTC)

Expanding use of preload to something on a per work basis

Latest comment: 2 years ago4 comments2 people in discussion

For some of our compilation works I would like to better utilise MediaWiki:Gadget-TemplatePreloader.js somehow for people working on these compilations.

A Biographical Dictionary of Modern Rationalists will be using the header adaptation Template:BDMR and it (now) has a /preload. I would like to have scope to put in this template rather than "header" utilising the path of the work. I would also like to see if we can look to do something similar with a range of other compilation works. I would hope that we wouldn't have to do it by editing the preloader.js itself but somehow leverage it by having a "by work" configuration file, json, something!

Nothing urgent urgent as I can do a templatescript that I can set up to do a replacement, but to make all these compilation works easier, better, uniform I see that it is our next evolution. If we can have it configurable outside of the javascript it gives great flexibility and ownership. Thanks for your consideration. — billinghurst sDrewth 02:54, 19 April 2021 (UTC)

@Billinghurst: hmm, yeah, so the idea is a (very) good one, but I'll need to think a bit about the implementation. JSON is probably a good call, as long as it actually loads as JSON when AJAX'd. I'll give it a poke at some point. Inductiveload—talk/contribs 09:08, 19 April 2021 (UTC)

Less urgent, I remembered how I did the override ... Template:Editnotices/Group/A catalogue of notable Middle Templars, with brief biographical notices. Leaves a bit of a fat header, however, good enough for just a simple template and transclusion where the fat editnotice matters less. The (ugly) things that you forget when you haven't done them for years, and how sad is it when you are talking in terms of many years. :-/ — billinghurst sDrewth 12:45, 28 April 2021 (UTC)

Even betterer, I have set up some group documentation that can be applied at Template:Editnotices/Group/doc — billinghurst sDrewth 13:04, 29 April 2021 (UTC)

9 years isn't even the record...

Latest comment: 2 years ago1 comment1 person in discussion

You may like to be aware of phab:T41510, both for yourself and if anyone else runs into it. There are workarounds through the API, asking a dev to clear it, etc. should that be needed. Xover (talk) 18:47, 29 April 2021 (UTC)

Dezoomify Question

Latest comment: 2 years ago3 comments2 people in discussion

I'm trying to use dezoominfy to get the images off The Jane Austen Manuscript website that are in the PD. However, I get the following error

ERROR: Could not open ImageProperties.xml (<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)>).

URL: https://images.cch.kcl.ac.uk/austen/liv/zoomify/vol_the_second/C8275-08/ImageProperties.xml

Are you familiar with the program or know how to fix this error? Languageseeker (talk) 20:29, 30 April 2021 (UTC)

@Languageseeker: It's been a long time since I used it. https://dezoomify.ophir.dev works, which suggests dezoomify-rs would work too? Inductiveload—talk/contribs 22:55, 30 April 2021 (UTC)

Thanks. I tried -rs and it didn't work. I guess that I might have to do it image by image through the web interface. Languageseeker (talk) 02:57, 1 May 2021 (UTC)

Add link to source on Index Pages

Latest comment: 2 years ago4 comments2 people in discussion

One thing that I noticed on frWS that would be helpful here was links to the original source on the Index page. Perhaps, we could import the links from Commons with the other metadata? This makes it much easier to fetch the full resolution images from IA. Languageseeker (talk) 04:39, 2 May 2021 (UTC)

We could do this, but it's one more thing to keep in line. It's possible that some kind of Commons SDC or Wikidata (probably via Wikisource index page URL (P1957)) might be more sensible, and building out an Index page SDC/WD infrastructure might be more sensible. Though I am personally hazy on when to use SDC and when to use WD. There's also User:Inductiveload/jump to file which will link you directly to the source file or hi-res JPG at the IA (or Hathi, or some others), and also give you the option of loading a high-res image into the ProofreadPage pane for when the scan thumbnail is a bit rubbish (PDFs are especially bad at this). Inductiveload—talk/contribs 23:08, 2 May 2021 (UTC)

I see. The user script is great. Would there be anyway to get it to upload the high-resolution image to Commons? I think that high-resolution images really come into play when trying to upload better quality images. Languageseeker (talk) 00:33, 3 May 2021 (UTC)

@Languageseeker: A WS-optimised uploader is on my middle-term list, but it won't (directly) be part of this script (though the script might direct you to a prefilled upload form). Uploading raw originals will be part of that.

In any case, the upstream original isn't that useful at WS in the general case, because it is usually needs non-trival processing to remove the paper colour and tidy up defects. It's nice to have the original at Commons, but it's not something we in general will present to readers. Inductiveload—talk/contribs 18:59, 3 May 2021 (UTC)

Stats color for 100 to 200 range

Latest comment: 2 years ago2 comments2 people in discussion

There’s a highlight color for >100 and one for <200. Could you add one for between 100 and 200? Languageseeker (talk) 00:17, 3 May 2021 (UTC)

My thinking is that highlighting isn't highlighting if everything is highlighted. The idea is only to call out things that have special meaning: falling short of a "minimum" level (bad) or exceeding some "maximum expected" level (good). Obviously, we don't yet have a good handle on our expected levels so 100/200 are made-up thresholds.

On that note, if we start hitting 200 per day on a regular basis, we should bump the limits so that we 1) keep the hounds snapping at heels by raising the "lower" bar and 2) raise the upper bar so we have a meaningful "this day was exceptionally good" signal rather than allow the whole table to turn green. Inductiveload—talk/contribs 19:05, 3 May 2021 (UTC)

Monthly Challenge - volume's image

Latest comment: 2 years ago3 comments2 people in discussion

Hello. Could you tell how to change on Monthly Challenge's page The French Revolution (Volume 1)'s cover from 5 to 9 (as index displays)? I can't figure out how to do it. Ratte (talk) 08:08, 4 May 2021 (UTC)

@Ratte: Sure: like this (at Module:Monthly Challenge/data/2021). It's slightly wierd, but it's the only way I can make it so that that MC grid can update automatically without needing a bot to run constantly to keep it up to date. I will be writing some central docs on how to work the MC infrastructure once I'm sure it does actually work (it's looking "OK" so far)! Inductiveload—talk/contribs 08:31, 4 May 2021 (UTC)

Thank you! Ratte (talk) 08:42, 4 May 2021 (UTC)

Transclusion Question

Latest comment: 2 years ago2 comments2 people in discussion

Can you take a look at The Works of H. G. Wells (Atlantic Edition)/Volume 1/The Time Machine/Chapter 1 and see why Chapter 2 is being included? Languageseeker (talk) 19:47, 7 May 2021 (UTC)

@Languageseeker: sure - the problem was that tosection is inclusive. Thus you needed a TM1 section to use as the tosection parameter. Inductiveload—talk/contribs 22:06, 7 May 2021 (UTC)

Monthly Challenge Sprint

Latest comment: 2 years ago2 comments2 people in discussion

Would you mind switching over the sprint to Science? Languageseeker (talk) 20:59, 9 May 2021 (UTC)

@Languageseeker: done. Inductiveload—talk/contribs 21:04, 9 May 2021 (UTC)

Setup a nude class for Template:authority/link and Template:authority/lkpl

Latest comment: 2 years ago3 comments2 people in discussion

Hi. As is possible with a.mw-disambig I would like to get some css class(es) on these links so I can visually track the use of these links in works (per User:Billinghurst/common.css). And you know that extended css is not my strength so would you mind doing that? Thanks. — billinghurst sDrewth 01:57, 11 May 2021 (UTC)

@Billinghurst: They now have ws-authlink and ws-authlkpl classes respectively. I haven't done per-field classes like {{article link}} because it's quite a bit of faffing in template mode (vs module mode) and doesn't have an immediate use. At least now, you can do

.ws-authlink, .ws-authlkpl { color: green; }

or whatever other style tickles your fancy. Inductiveload—talk/contribs 06:38, 11 May 2021 (UTC)

Thanks, I am hoping that it is useful for visual link check diagnostics. — billinghurst sDrewth 12:16, 11 May 2021 (UTC)

NIH, or...

Latest comment: 2 years ago2 comments2 people in discussion

Add a pinch of caution on this one. This may be just NIH, but there are several red flags here for me. For one, we're only just now hearing about this, when the project is essentially finished (cf. the grant proposal: it ends in May). The project was to involve lots of community consultation, with "the Wikisource community", but it looks like they've only talked to the Punjabi Wikisource (the home wiki for at least one of those involved). There are significant differences between the language projects that may make a single-project setup untenable for other language projects. And while better Wikidata integration is awesome, it needs very careful design and management: e.g., what happens when a arWP editor hops over to Wikidata and changes the fields that are used on our Index pages? Or changes it locally on arWP and some set of tools makes the change also happen on Wikidata. Or a bot does a new monster import of Worldcat, with zero verification of the data. Or… All the projects have different policies (as enWP found to their detriment) for things like verifiability, conduct, conflict resolution, etc. And different priorities, which is a perfect recipe for conflict. Technologically crufty as our current indexes are, we own that data and can make policies for them, and patrol changes to them. Once we outsource them we have zero control. And what happens once this grant is exhausted? "Wikidata integration isn't really supported; it's really buggy and nobody is likely to fix it any time soon."

So maybe a more succinct summary is: if you find good stuff there then we should certainly crib what we can, but on our terms and we can't just assume that because someone got a grant to make something it is necessarily good for us. And a lot of the coolest potential stuff for WD<->WS integration needs community buy-in and policy changes, not just technical plumbing. Bah, humbug! :) Xover (talk) 20:32, 11 May 2021 (UTC)

@Xover: I agree 100%. Mostly they were vague on exactly which Wikisourcen they were even talking about. I can turn grumpy and lawn-territorial later on, as needed ^_^. Besides, unless they have a global interface admin and/or GS on side and willing to egegiously abuse those rights, they'll need someone with the bits locally anyway.

As it is, eventually I'm long-term hoping for a frWS-kind of affair for Index page WD, perhaps with more twiddly bits, perhaps not. But until the Datazoids 1) figure out how we're supposed to record, in particular, per-volume data for multi-volume works, and 2) write that down somewhere incredibly clearly in words of one syllable for my dumb ass (or someone figures out what the Commons Structured Data is actually for and tells me that's what I should be using and then tells me how), I find myself suffering from a Vitamin Care deficiency. Inductiveload—talk/contribs 20:42, 11 May 2021 (UTC)

autopatroller

Latest comment: 2 years ago10 comments2 people in discussion

Undertaking this should not be necessary for autopatrolled users, especially where it has been asked on their user talk page repeatedly. — billinghurst sDrewth 14:32, 11 May 2021 (UTC)

@Billinghurst: What's wrong with how it was before? Those four lines are not four paragraphs, they're a single "semantic" line with manual breaks, so <br/> is valid? Inductiveload—talk/contribs 14:39, 11 May 2021 (UTC)

Aaagh, this one https://en.wikisource.org/w/index.php?title=Nixing_the_Fix&diff=prev&oldid=11272197 (sorry was going to bed). — billinghurst sDrewth 22:28, 11 May 2021 (UTC)

and fwiw when using the +er-blocks they do need to be separated by blank lines rather than closer with BR as the text is not suitably separated, it overlaps (in Firefox). — billinghurst sDrewth 22:33, 11 May 2021 (UTC)

@Billinghurst: Hmm, that's not ideal, indeed.

I don't see anything wrong with the -er-blocks in Firefox. If the line heights are wrong on some platform or something, we need to address that, as line wrapping doesn't only happen with manual BR tags. Does the lorem ipsum in the docs at {{xxxx-larger}} do it as well? Inductiveload—talk/contribs 22:38, 11 May 2021 (UTC)

I have no idea whether it is platform specific or not, just saying that I am seeing it that way in my browser. With the (larger)+er block usage (usually above x-) it is typically a display set of text like title pages rather than standard body text as can happen with the (smaller)-er blocks we need that line spacing to contract like it does neatly. So in the display pages we can just throw formatting at it and have it display nicely. So if shoving blank lines or BR doesn't matter as the look is okay, so do it to suit the required display. [Changes now is going to impact a lot of pre-existing pages where it is display for display sake. — billinghurst sDrewth 22:49, 11 May 2021 (UTC)

Right, but it's not good if one user thinks something displays properly and another see it broken. For example, with BR, this is what I see: https://i.ibb.co/p2zTM9B/2021-05-11-235139-523x531-screenshot.png. Which is (pedantically) more like the original, not that it really matters. I have used BR-in-largers on title pages before, so it's not good if it turns out they are coming out busticated for some.

Do you see any difference here between 1 and 2? Inductiveload—talk/contribs 22:56, 11 May 2021 (UTC)

Oh yeah. phab:F34451074 and attached you there. — billinghurst sDrewth 02:38, 12 May 2021 (UTC)

@Billinghurst:, OK I have harmonised the larger-block templates to set the line height to the same as the smaller ones (1.4), which isn't needed for me, but apparently is for you, despite us both being Firefoxers. It should now be functional to have a line break in these templates (forced with BR or natural). Thanks for the screenshots, very helpful. Inductiveload—talk/contribs 13:54, 12 May 2021 (UTC)

phab:F34451086 the original page series in screenshot. — billinghurst sDrewth 02:48, 12 May 2021 (UTC)

podcast user:Jon (WMF)

Latest comment: 2 years ago1 comment1 person in discussion

https://www.podbean.com/ea/dir-ahhwj-e446592 Worth a listen to Between the Brackets — billinghurst sDrewth 04:48, 12 May 2021 (UTC)

Thank You

Latest comment: 2 years ago15 comments3 people in discussion

I saw the amazing work that you did on the MC-Cover Template and I wanted to send you a quick thank you. It looks wonderful. Languageseeker (talk) 18:31, 27 April 2021 (UTC)

@Languageseeker: ...and now the listing is automagic - all you have to do is keep it topped up with works, and it'll take care of putting them in the right sections (as long as we have under about 500 active—i.e. immune or < 3 months old—works, which I think won't be an issue!). Still no way to have a page counter without a bot or similar, but at least this avoids a daily need to check statuses and risk works stagnating in the wrong sections. Inductiveload—talk/contribs 20:32, 27 April 2021 (UTC)

The page looks so beautiful. It’s more than I could have hoped for. Once we reach 500 books a month, we can make it the Weekly Challenge (hee-hee). Is there any way to see texts that have only a few pages to validate or are mostly proofread? I also want to use this as a means of finishing texts. Simply outstanding work. Languageseeker (talk) 00:43, 28 April 2021 (UTC)

@Languageseeker: Glad you like it!

Re Is there any way to see texts that have only a few pages to validate or are mostly proofread not really, this is what we need a bot or phab:T281195 for. Once the proofread percentage data is either sitting in a module (or directly available via some magic tag), we can work it into the tiles for a quick overview. Daily stats like frWS will almost certainly need a bot - I don't see that being built into the core any time soon (though it would be pretty nifty if it could).

I feel like the near-live stats are part of what drives the frWS project, without the feedback, people drift back to their own domains. Inductiveload—talk/contribs 00:51, 28 April 2021 (UTC)

I think that stats are only part of what drives users to contribute to frWS. If you look at Distributed Proofreaders, users contribute even when they only see the number of books completed every month. In my opinion, most users simply don't know what to do when they arrive on the site. Right now, we basically tell users to pick a project any project. However, most volunteers just wanted to be given a specific task. They also want to contribute to something important. So, I hope that by putting attractive texts on the Monthly Challenge, the site will grow it's user base. That is why I'm selecting texts that have broad recognition or look cool. My other hope is that by creating scan-backed copies of important works, we will attract more users to our site. Our key benefit is that we can combine scan-backed proofreading with the generation of ebooks in a way that allows for future improvement of formatting by relying heavily on templates. That's the key selling point of this site.

I also hope that by creating a Monthly Challenge, it will become easier to reach out to GLAM institutions. Most GLAM institutions simply don't have the money to proofread texts. Scanning is cheaper, but they don't want to contribute scans if they'll just grow moldy in the backpages of WS. With the Monthly Challenge, we can tell GLAM institutions that if they provide us with the scans, then we'll feature them on the Monthly Challenge and get them proofread. Right now is a great time to reach out to GLAM institutions because the demise of flash has had a devastating impact on GLAM websites. Combined with the pandemic, most GLAM institutions are choosing to either take them down or leave them broken. enWS can reach out as a new home for these scans.

On a separate technical note, I've noticed that the box containing the PoTM vanishes if the site window gets to small. Maybe, it would make sense to make sure that the PoTM and Monthly Challenge box does not vanish on small displays? Also, would it be possible to create a way to easily monitor the talk pages of all the Indexes in the Monthly Challenge to see if a user asks a question there? Languageseeker (talk) 02:52, 30 April 2021 (UTC)

Re key selling point that's my view too.

Re monitoring all talk pages, this should probably be a script to add all MC index talk to your watchlist, and probably is not that hard. I'll look into it.

Re small screens, I think the reasoning is that the proofreading UI is so unwieldy on a mobile screen that there's not much point directing mobile users to working spaces anyway. I think we could at least advertise a bit, once we have some progress to advertise, but it'll need some thought to make it useful/interesting to mobile users. Inductiveload—talk/contribs 22:52, 30 April 2021 (UTC)

For the small screens, I meant when you resize your desktop internet window, the PoTM box vanishes. On the frWS, it gets moved to the bottom in a long column, while on the enWS it simply vanishes. I can understand why it's ommited on mobile, but, on desktops, you can resize the window. Languageseeker (talk) 02:59, 1 May 2021 (UTC)

Oh yeah, good point. We actually can target the mobile front page differently. I'll take a look (at some point). Inductiveload—talk/contribs 11:40, 1 May 2021 (UTC)

Take your time. I was also thinking that it might be good to add some stats to the front page template like the French do. It can go between the line about the number of works and the sprint listings. Column 1 Mission 2000

Column 2 Results of 2021 : May 2021 (Total Pages validated or proofread) (percentage of 2000) ((daily change) pages)

Thoughts? Languageseeker (talk) 12:47, 1 May 2021 (UTC)

This should be possible once we have a few stats to use. Year-to-date stats will obviously only make sense from next month, and total-to-date only from next January. The whole stats system still needs a bit more tweaking before it becomes a hands-off automatic thing (for a start, moving to a Toolforge backend).

On the front-end, feel free to mock it up at {{Collaboration/MC/sandbox}} using fake figures and I can use that as inspiration when I have a suitable stats-processing function to wire up to it. Inductiveload—talk/contribs 14:25, 1 May 2021 (UTC)

Done Languageseeker (talk) 15:30, 1 May 2021 (UTC)

@Inductiveload: Any updates on implementing this? Languageseeker (talk) 20:06, 13 May 2021 (UTC)

@Languageseeker: {{Monthly Challenge statistics}} is now a thing (it's currently showing the average-to-date of the month), but it will need more work to handle the monthly roll-overs for things like day-1 stats. I think we'll see a few things falling over on the 1st. Inductiveload—talk/contribs 20:09, 13 May 2021 (UTC)

I would be interested in stats on click-throughs to the sub-pages of works that are expected to be read-through. Or clicks from wikipedias to their little source sisters. CYGNIS INSIGNIS 15:16, 1 May 2021 (UTC)

We can get pageview data for pages via {{annual readership}}, but figuring out how users get to our pages and how they traverse them would possibly start running into issues with data collection fairly quickly (at least, you'd have to be very careful the data is fully anonymous, and even then I'm not sure of the rules on WMF sites). Inductiveload—talk/contribs 20:39, 1 May 2021 (UTC)

New WMF toy in testing

Latest comment: 2 years ago7 comments2 people in discussion

Special:Preferences#mw-prefsection-betafeatures, enable "Discussion tools". Beside being really convenient both for replies and new threads, it eliminates a whole lot of "woops, forgot to sign" issues. I mention this for no particular reason. At all. Just completely randomly. :) Xover (talk) 08:35, 5 May 2021 (UTC)

@Xover: Gosh, a useful feature in the Beta section! ^_^ Inductiveload—talk/contribs 08:50, 5 May 2021 (UTC)

I know. Shocking, isn't it? :) Xover (talk) 09:24, 5 May 2021 (UTC)

Aurp. You may also want to go into the Appearance section and uncheck "Use Legacy Vector". Not due to its awesomeness, but to be aware of and be prepared for what's (apparently) coming. Lots of good ideas, some less good, and a at time really janky implementation. I'm suspecting a lot of the impedance comes from two main factors: the team's prioritising readers over contributors, and a basic assumption that all content on the wiki is user generated (true for almost all the other projects) and uniform across pages (also true for most other projects).

I have good experiences with some of the people involved (receptive to feedback etc.), but it's probably going to take some effort on our part to affect anything here. Xover (talk) 07:57, 14 May 2021 (UTC)

@Xover: hmm, yeah I've tried that before and it doesn't look great. However, "fixing" the centred column formatting, if we wanted to do so, is probably "only" a matter of futzing with CSS here and there, so it's "probably" OK. I guess it's a matter of how much the current state is WIP and how much is actually considered "done".

Then again, some limiting of content width is also probably not a totally insane idea, since a 120em page (1920/16 = 120) is pretty excessive in most cases. But almost certainly the Page: NS editor might need special casing.

Anyway, since from the timeline on the project page it looks like this is some time away for us, I'm not sure there's a whole lot to do here right now. If we are sulkily disinclined to make any real effort, we can also let some other Wiksourcen take the early-adopter hit and only opt in once they've ironed out the wrinkles. I suppose collecting a CSS "fixes" file as a CSS gadget (targeting .skin-vector:not(.skin-vector-legacy) or similar) might be helpful when petitioning the reskin project for mercy?

For example, this removes the most egregious mis-feature, IMO: the 9em margin-left on #content:

.skin-vector:not(.skin-vector-legacy) #content {
    margin-left: 0;
}

Inductiveload—talk/contribs 12:30, 14 May 2021 (UTC)

A lot of stuff is effectively getting locked in now, so if we want to affect anything we need to start talking to them sooner rather than later. And that will require familiarity with the current state of it and their roadmap ahead, and, perhaps even harder, having some idea of what we want. It is also an opportunity to get things done if we have pain points in this area. Right now there is attention and assigned resources, which, well, you know how things usually go when that's not the case. Xover (talk) 12:44, 14 May 2021 (UTC)

@Xover: Probably would help if I checked User:Inductiveload/vector.css wasn't causing chaos with the new layout before I was too rude about it in public... >_< Inductiveload—talk/contribs 13:48, 14 May 2021 (UTC)

on page numbers

Latest comment: 2 years ago7 comments3 people in discussion

You mentioned these recently and I wanted to blather on about them, I think they are a crucial addition to the site. I try to show by example, but don't remember where they are: this is what I would have shown, The year's at the spring, which I still think works; the other being an original (ie printed, analog) index conveniently linked by pages numbers (not hundreds of anchors). Hope this was of interest. CYGNIS INSIGNIS 21:38, 9 May 2021 (UTC)

@Cygnis insignis: are you meaning the toggles in the left toolbar in the "Display Options" section? — billinghurst sDrewth 01:59, 11 May 2021 (UTC)

@Cygnis insignis: thanks for the note. I do wonder about linking only page numbers because it seems counter-intuitive to me - it "feels" like they're going to take me to the Page NS, though I imagine a user unfamiliar with the site wouldn't have that feeling.

More importantly (to me): on export this means the TOC text is not clickable (and the user agent may not make it clear where the link is):

Furthermore, the hint to the export tooling of the name for the entry in the "EPUB TOC" (the one you get when you "view book structure", as opposed to the one in the "content", which replicates the original) is the link text, so using the page number makes it somewhat unclear what is going on:

Sorry to be such an export bore! Inductiveload—talk/contribs 06:52, 11 May 2021 (UTC)

I hadn't given that much consideration, although I also think that is important I hadn't expected the problems you outline. I suppose there are many potential hazards. I'll go off and learn about what's going on with EPUB, etc, maybe pick this thread up again when I better understand the export side of things. Thanks for the info, and for the couple of times you helped solve a problem recently (kept forgetting to say "that worked a treat, cheers) Have a good one. CYGNIS INSIGNIS 07:23, 11 May 2021 (UTC)

Oh, in indexes. user:Phe did quite a string of those for me back in the year, eg. The Elizabethan People/Index. Do we need to do anything different with them for export? — billinghurst sDrewth 12:14, 11 May 2021 (UTC)

@Billinghurst: no, that works fine - the #XXX links work just fine in the EPUB (and they make sense to me as the clickables, since one index entry can have n pages). It's the TOC where the exporter is actually using the link text to construct its own idea of what the entries in the document's structure are called. Inductiveload—talk/contribs 12:37, 11 May 2021 (UTC)

I'd forgotten about something, and having reread that just now I recognise it was inappropriate for me to post here. Cheers anyway for the replies. CYGNIS INSIGNIS 00:05, 15 May 2021 (UTC)

Page:The Works of H G Wells Volume 1.pdf/7

Latest comment: 2 years ago4 comments2 people in discussion

Okay, where's the Index:page ,the file information and what the actual scans got out of step? Because the Hi-res scans this is looking for bear no resemblance to the ones in the PDF/DJVU for the nominal page numbering? Suggestions are welcome because I don't like playing hunt the glitch. ShakespeareFan00 (talk) 23:49, 14 May 2021 (UTC)

@ShakespeareFan00: Mea Culpa, this is actually the UC copy not the UM copy. Sorry about that. The UC has the British printing and UM has the US printing. Updated as appropriate. Languageseeker (talk) 00:40, 15 May 2021 (UTC)

Also, the hi-res versions weren't available when I uploaded the volumes. I left a message for them and they seemed to fix them. If someone wants to replace the low-res version with a higher-res version, feel free. Languageseeker (talk) 00:43, 15 May 2021 (UTC)

Thanks... The script seems to be appending an additional 0 to the page numbering for some reason. Is something doing a string concatenations when it should be doing an addition? ShakespeareFan00 (talk) 07:23, 15 May 2021 (UTC)

Index date validated not working

Latest comment: 2 years ago2 comments2 people in discussion

I’ve validated these indexes but the index validated date is not working. Index:Asian Infrastructure Investment Bank Act 2015.pdf and Index:Poems (Edward Thomas, 1917).djvu. Both indexes are fully transcluded. Could you please have a look at this. My OS is Mac OS and my browser is Opera. I also tried viewing with Safari and its still the same. --kathleen wright5 (talk) 02:10, 15 May 2021 (UTC)

@Kathleen.wright5: You need to add the "month year", what is there as grey text is an example of the format. Inductiveload, we may need a "for example ..." prepend in the field. — billinghurst sDrewth 02:22, 15 May 2021 (UTC)

Dislike DHR

Latest comment: 2 years ago7 comments2 people in discussion

I dislike having DHR in works that I do, especially when I can just as well space them out with hard returns and not get code bloat and impact readability. So I am not sure why you are replacing them with {{padded page break}}. I especially see DHR used way more than necessary when we can just be adding clean clear space. Am I missing something? — billinghurst sDrewth 05:54, 17 May 2021 (UTC)

There are a couple of reasons I would put for why I think {{ppb}} is better (and why I'm not just randomly screwing about):

Multiple hard returns in the code actually result in "stacked" P tags that contain a single BR each in the output. This is 1) semantically wrong as there are no structural paragraphs there at all and 2) this relies on the rather inconsistent way MW throws out P tags, BR tags, and how the inter-paragraph margins stack up. For example:

3 blank lines:	foo bar	<p>foo</p> <p> <br> </p> <p> <br> bar </p>
4 blank lines:	foo bar	<p>foo</p> <p> <br> </p> <p> <br> </p> <p>bar</p>

This means that the actual gap you get doesn't scale linearly in number of lines, because every odd number of lines collapses one BR into the last P tag (or at least it does right now, but because P-wrapping is a mess, who knows if this behaviour is a safe long-term bet):

1 line:

bar

2 lines:

bar

3 lines:

bar

4 lines:

bar

5 lines:

bar

Secondly, multiple blank lines in Wikitext are fragile not only because MW makes no hard guarantees about how it's going to handle them, but also because the editor intention behind them is not always clear. How important is it that there are 2 blank lines here? Does the editor mean they want exactly one P tag containing exactly one BR tag (for a 0.5em + (1.6 * 1em) total effective gap)? Or did they actually just mean they want "some" gap? Whereas {{ppb}} is explicit: This is a page break with padding around it. Actually, I think all visible page breaks should get padding and then we wouldn't need {{ppb}} at all, because you very rarely want two pages jammed right up against one another, IMO
- For that matter, {{dhr}} has some small advantage as well, especially on title pages: there is (hackish) CSS in the epub export which limits the height of a DHR to 100%, because massive stacks of multiple P tags usually result in a random division, with some P's on one page and the rest on the next, depending on how many P's there are and how big the reader screen/font-size is. Furthermore, DHR is an explicit "this div makes blank space" structural element with a class that allows targetting of CSS (which is how the EPUB exporter does that). Bare P tags have no such intention marker.
Because the P tags and the page-break DIV (which is what actually causes a page break on export) are disconnected siblings, you end up with free-floating P tags on each side of the break on export. This is currently true of {{ppb}} too, but...

The implementation of {{ppb}} certainly is was lacking, an DHR-PB-DHR is the wrong solution for it (even if it is brutally functional). What needs to happen is {{page break}} should some parameter that allows to adjust this. This is on my "make exports moar bettar" list, but I haven't gotten to it yet well, I have now, wasn't as hard as I thought. Inductiveload—talk/contribs 08:32, 17 May 2021 (UTC)

Also, similar to the (new) {{ppb}} without DHR escorts, {{section end rule}} provides a similar thing for a rule. By adding a dedicated class, default padding and width (and any other CSS) can be applied though index-level CSS, so you don't need the {{dhr}}{{rule}}{{dhr}} idiom (or anti-pattern, depending on how grumpy one feels), you just need {{ser}}.

The idea is similar: allow the user to express what they mean, as well as avoiding spraying out 3 separate HTML elements that might or might not even end up on the same page. Inductiveload—talk/contribs 13:07, 17 May 2021 (UTC)

Hah! I hate terminating section rules too—unneeded book artefact like hyphens, etc. For me they fall between sections and should not be transcluded. :-) — billinghurst sDrewth 06:58, 18 May 2021 (UTC)

@Billinghurst: First, not all {{ser}}s have to be at the end of a transcluded page.

Second: remember, a single template which adds a class + per-index CSS can do this: .ns-0 .wst-section-end-rule { display: none; }. No dummy sections needed. Inductiveload—talk/contribs 08:17, 18 May 2021 (UTC)

Sure, or just not get wedded down in thinking that the 19th book compositing needs to be 21st century computer presentation. The word is king. — billinghurst sDrewth 10:52, 18 May 2021 (UTC)

@Billinghurst: I don't follow. The point is this allows it to not be part of the presentation, without having to faff about carefully arranging sections or futzing with no-includes. Slap a {{ser}} down, add suitable CSS (once only) and it's done - no section-end rules in mainspace. Inductiveload—talk/contribs 10:56, 18 May 2021 (UTC)

Characters required for Old English keyboard

Latest comment: 2 years ago9 comments2 people in discussion

Hi Inductiveload, thanks for offering to make this; it would be really great to enable quick proofreading of these texts by specialists and non- alike. So this would require the following characters:

1) all 26 characters from the Modern English alphabet

2) the following characters from the O.E. alphabet:

eth: Ð,ð
thorn: Þ,þ
wynn: Ƿ,ƿ

3) A,a E,e I,i O,o U,u and Y,y with macrons, as this is used editorially to distinguish long from short vowels

Thanks so much in advance. Rho9998 (talk) 12:38, 17 May 2021 (UTC)

@Rho9998: It should now be available in your Wikieditor "Special Character" palette: screenshot. I didn't add the 26 normal letters because they're all on a keyboard and it'll just add clutter I think.

Let me know if you think of any other characters to chuck in :-) Inductiveload—talk/contribs 12:59, 17 May 2021 (UTC)

@Inductiveload: Nice one. Well done on including the Tironian note (⁊) and the shorthand for þæt (ꝥ) as I forgot to mention them. That reminds me that on the punctuation front there's also the interpunct (·) which was used in the original manuscripts (although most editors modernise the punctuation). And checking Wikibooks I'm told there's also g with macron to abbreviate the prefix ge-. This further reminds me that there's g and c with dots on the top (ċ,ġ) used by some editors to mark when they're soft. Sorry for forgetting these but I hope that should be all of them now.

If you're feeling so inclined, you could also add runes. These are used sometimes in O.E. but I'd imagine they'd be useful for other languages and could merit their own keyboard.

@Rho9998: OK, added. I put the runes into Old/Middle English, but they could also go to their own panel. ¯\_(ツ)_/¯ Inductiveload—talk/contribs 13:42, 17 May 2021 (UTC)

@Inductiveload: So can Wikisource transliterate the special characters directly, or will I have to go through and add in every one? This is as far as I am with my first attempt: Index:Hargrove1902alfred'soldenglishversionofaugustine'ssoliloquies.djvu

@Rho9998: You mean, in terms of the OCR? It looks like the Google OCR tool recognises this as Old English (even though the IA OCR clear did not). Instructions: Help:Gadget-ocr.

and þæt ic mage geearnian þæt ic sī wurðe þæt dū më
for dĩnre mildheortnesse ālyse and gefrēolsige. Ic clypie to
þë, Drihten, Þū be æall geworhtest, þæt þe æalles ge-
weorðan ne mihte, në æac wunian ne mihte būtan þë.

If all OCR fails, your only options are to proofread by hand from first principles, or find a matching text to copy-paste from.

BTW, it might be clearer to name the file something like "King Alfred's Old English version of St. Augustine's Soliloquies - Hargrove - 1920.djvu"? I can move the file if you like? There's nothing technically wrong with it as it is, it's just a bit of an eyeful. Inductiveload—talk/contribs 15:44, 17 May 2021 (UTC)

@Inductiveload: re the file name sorry about that I didn't know how to change. I think the OCR has read it as modern English because the first part is, or because I've put en as the language? I'll familiarise myself with how the OCR works. Thanks for the link.

@Rho9998: Ta-da: Index:King Alfred's Old English version of St. Augustine's Soliloquies - Hargrove - 1902.djvu.

R.e. the OCR, this is nothing to do with you. The OCR is "baked-in" to the OCR at the source (the Internet Archive). The Google OCR button sends an image to some Google cloud thing and it regenerates it from scratch. Whatever Google does seems to notice when it is fed Old English, whereas whatever the IA did failed to notice (or was set to English only). The "normal" black OCR button just returns whatever OCR is in the file if it can, so that's what you get.

The is a current technical project to make Wikisource OCR processes a bit better, but for now, the Google button is probably your best bet. Inductiveload—talk/contribs 15:59, 17 May 2021 (UTC)

@Inductiveload: Yeah I see what you mean now; I didn't see the big black and then multicoloured OCR buttons: doh! I agree Google OCR comes out much better. Unfortunately it looks like its read a lot of the macrons as either diaereses or circumflexes. Would I be able to do a 'replace all' for the whole OCR-ed text? Thanks for the file name change. Rho9998 (talk) 16:07, 17 May 2021 (UTC)

@Rho9998: You can add the following Javascript (just copy-paste) to Special:MyPage/common.js. Then you should see a button marked "OE OCR fixes" in the sidebar on the left:

$.ajax('//tools-static.wmflabs.org/meta/scripts/pathoschild.templatescript.js', {
  dataType: 'script',
  cache: true
}).then(function() {
  // page NS
  pathoschild.TemplateScript.add(
    [
      // RunningHeader
      {
        name: 'OE OCR fixes',
        position: 'replace',
        script: function(editor) {

          // replace most diacritics with macrons
          var replacements = [
            [ /[àáäâã]/g, 'ā' ],
            [ /[èéëêẽ]/g, 'ē' ],
            [ /[ìíïîĩ]/g, 'ī' ],
            [ /[ũúüûũ]/g, 'ū' ],
            [ /[òóöôõ]/g, 'ō' ],
            [ /[ỳýÿŷỹ]/g, 'ȳ' ],
          ];

          for ( var i = 0; i < replacements.length; ++i ) {
            editor
              .replace(replacements[ i ][ 0 ], replacements[ i ][ 1 ]);
          }
        }
      },
    ], {
      category: 'main',
      forNamespaces: ["Page"]
    }
  );
} );

The code like /[ỳýÿŷỹ]/g is a w:regular expression (see also here), the one after that is what is is replaced with. Basically any char found that's inside the square brackets will be replaced with that one. The g means it will do it as many times as it can in the text. Run if after you use the Google OCR button.

You can add more replacements as you need, or come and ask me to make a regex to solve something (i'll need example input and output text, as well as any examples you can think of where the replacement should not be made).

Don't be shy about editing your own JS: you can't damage anything and if it stops working, you can just go back in the edit history to an old version :-). Inductiveload—talk/contribs 16:26, 17 May 2021 (UTC)

@Inductiveload: That is amazing, thank you! I've also copied and pasted the code and then edited the duplicate so that there's the option of replacing all wynns (ƿ) with thorn (þ) for texts where wynn is changed to 'w' anyway - you can see why the OCR would get them confused. As this text has the Latin source in the footnotes I won't add one to replace 'p'-s with thorns as that would cause more problems than it solved! I've also added ash and its accented variants by copying and adapting one of your lines of code - and this does make me realise that sometimes there are macron ash-s so if you feel like adding it to the keyboard please do, though note it's not a major hurdle for the instant as most ash-macrons are automatically inserted by the fixes you wrote. Rho9998 (talk) 17:44, 17 May 2021 (UTC)

@Rho9998: glad it's working out for you! Ǣ is already there, next to Ā. I noticed it while testing the replacements. :-) Inductiveload—talk/contribs 17:55, 17 May 2021 (UTC)

Fixing the Transclusion of Index:Works of Jules Verne - Parke - Vol 5.djvu

Latest comment: 2 years ago8 comments3 people in discussion

When this volume got transcluded, it seems that not enough care was taken. Can you move all the pages to Works of Jules Verne/Volume 5/. Also, for Works of Jules Verne/The Mysterious Island, the Chapters should be moved to Works of Jules Verne/Volume 5/The Mysterious Island/Dropped from the Clouds/ e.g. Works of Jules Verne/Volume 5/The Mysterious Island/Dropped from the Clouds/Chapter 1 Thanks. Languageseeker (talk) 20:12, 17 May 2021 (UTC)

We don't need the volumes in this situation, they are just physical artefacts and make navigation difficult. Having as WORKS / SUBWORKS is perfectly fine; remainder of the work is done that way, and we should continue that way. Moved the subpages of the second work (after effing it up the first time). It seems that we will need AuxTOC in various places to properly generate the download output. Not certain whether it should be at the root page, or in subpages like Works of Jules Verne/The Mysterious Island. — billinghurst sDrewth 08:02, 18 May 2021 (UTC)

Not certain whether it should be at the root page, or in subpages You can have both: add the root page and subpage to the category Category:Ready for export. Note: the TOCs on the subpages must be inside a container classed ws-summary. This can be {{AuxTOC}}, {{TOC begin}}, or, if you are wrapping things manually {{export TOC}} or {{hidden export TOC}}. Inductiveload—talk/contribs 08:21, 18 May 2021 (UTC)

My comment was more about the structure of the entire work. Not enough of sampling to make a recommendation, so put out the reminder. — billinghurst sDrewth 10:48, 18 May 2021 (UTC)

General question, from https://en.wikisource.org/w/index.php?title=Works_of_Jules_Verne/The_Mysterious_Island/Dropped_from_the_Clouds/Chapter_1&action=history&curid=3639653 are you able to select a line, and edit/create a tag via the button? I just get fatal errors, though my rights are such a combination who knows what they think they are doing. — billinghurst sDrewth 16:43, 18 May 2021 (UTC)

@Billinghurst: [af12dbf3-a0ff-4f91-bf0c-0a812f3bb62b] 2021-05-18 16:50:35: Fatal exception of type "Error" :-/ Inductiveload—talk/contribs 16:51, 18 May 2021 (UTC)

Hmm. @1234qwer1234qwer4: as someone without advanced rights locally though good mw knowledge, can you please tell me what you can do with these tags. Thanks. — billinghurst sDrewth 01:24, 19 May 2021 (UTC)

Bug reported, bug fixed. All good, sorry to be a bother. — billinghurst sDrewth 03:10, 19 May 2021 (UTC)

author = unknown

Latest comment: 2 years ago11 comments2 people in discussion

In olden times we used to direct an author = Unknown ability in header. From my looking at the instructions for the template that is now not said. And it still seems to work with the display logic, however, we are getting the equivalent of links to Author:unknown. Not sure when that occurred, and the when isn't really important. Is it an easy fix in your code? If not, then I will run my bot through to convert these to | override_author = Anonymous and the corresponding categorisation. There are about 600.

We also had a case where someone was using the override and still having author = author so had a magic load of links pointing there, and the categorisation. Those I have fixed and mentioned to the contributor about how to handle. — billinghurst sDrewth 15:34, 19 May 2021 (UTC)

Meh! And Author:not mentioned from translator field — billinghurst sDrewth 15:38, 19 May 2021 (UTC)

@Billinghurst: author = unknown does work and categorises. E.g. An Index Expurgatorius. It could link to Portal:Anonymous texts if you want (the old header template did not do this, so I followed it).

For translator and editor, both fields also special-case "Unknown" in the same way (including suitable categories). Inductiveload—talk/contribs 15:43, 19 May 2021 (UTC)

They are shadow author linking. I am seeing them categorised to category:Works with non-existent author pages — billinghurst sDrewth 15:46, 19 May 2021 (UTC)

@Billinghurst: ...or are they? diff. So many corner cases in the headers! Inductiveload—talk/contribs 15:54, 19 May 2021 (UTC)

I am not understanding the assertion. — billinghurst sDrewth 15:57, 19 May 2021 (UTC)

@Billinghurst: ...I fixed it. "unknown" in the author (or editor or translator) field no longer categorises to category:Works with non-existent author pages. Inductiveload—talk/contribs 16:01, 19 May 2021 (UTC)

Thanks. And "not mentioned" in translator? — billinghurst sDrewth 16:25, 19 May 2021 (UTC)

Chalk up another tin of Campbells to the face. See, for example The Voyages and Adventures of Captain Hatteras. Inductiveload—talk/contribs 16:41, 19 May 2021 (UTC)

Beaut, that should get rid of another swag. Got the category down by about 1500 today.

Comment Some days it is one slightly opening a cupboard door and out come thousand cans of tomato soup on your head. Followed by wikispelunking special:whatLinksHere/author:author= author <deskthunk> — billinghurst sDrewth 16:31, 19 May 2021 (UTC)

Faaaaark Special:WhatLinksHere/Author:No contributor recorded, and I will bot these ... when my other run is finished, then I will get Wikisource-bot to touch all those pages in that category again for the fourth time today. At one a second it is either a fun bucket, or a bun ... — billinghurst sDrewth 16:50, 19 May 2021 (UTC)

on having and being a bad experience

Latest comment: 2 years ago2 comments1 person in discussion

10962243 That made me laugh, last February, because "Replay Gain" gives really good info for remastering and when dreaming/thinking sounds like a rant.

I was trying to find a way to make better conversion of color to monochrome before my computer broke, but there was no reliable way yet. Some conversions were great for some but not for others and I could not figure out the why yet. Like it could have been darker needs this not that, or more reds are better with that, not this. It surely was not as consistent as clipping in sound files.

Any rant would have been directed at myself. The decision making that follows the simple monochrome or color toggle was my personal boggle.

Then, the weird fact I learned a while back about the delivery of TV images when we used antenna. That both color and b&w were sent on the same wave. People are so clever! Such a thing is terrible though, for the digital realm, making downloads and files bigger -- but I really thought back then that the conversion was in the device! But, the filming or the processing of it, if you care about quality, is very different for the two.

More rant?--RaboKarbakian (talk) 15:42, 19 May 2021 (UTC)

So, not only are my headers gone but now templates are too. This diff: https://en.wikisource.org/w/index.php?title=Help_talk%3APreparing_for_export&type=revision&diff=10962243&oldid=10962213

--RaboKarbakian (talk) 15:45, 19 May 2021 (UTC)

Traffic Signs Manual/Chapter 1 (1982 revised 2004) template use fail b/w browsers

Latest comment: 2 years ago8 comments3 people in discussion

If I look at that page in Firefox (w10, 88.0.1) it is a fail, though looks okay in Chrome. I will leave to work out the css issues as it is beyond me. — billinghurst sDrewth 01:02, 23 May 2021 (UTC)

@Billinghurst: could you be a bit more specific about what is failing? It looks pretty OK to me. Inductiveload—talk/contribs 12:04, 24 May 2021 (UTC)

Apologies, was juggling 5 tasks again, half messages. phabricator:F34466027 numbering top right. — billinghurst sDrewth 12:25, 24 May 2021 (UTC)

@Billinghurst: OK, can you have a quick check to see if it's working now? Feels like it might be the same as before with the line-height, this time on {{fs}}. Inductiveload—talk/contribs 12:31, 24 May 2021 (UTC)

Yes, resolved. Though that we have an issue in the same browser that simply differs by underlying instrument is an issue, especially where one of those is a reasonable common operating system. — billinghurst sDrewth 12:37, 24 May 2021 (UTC)

Seems like Windows + Firefox is sensitive to line-height for larger fonts in a way no other renderer is. I'm not sure if that's a browser bug or undefined or what (@Xover: do you know?). But the "solution" is apparently to set line-height where needed. Annoyingly, there appears to be no way for a non-Windows Firefoxoid to know if this is happening. Even at [6], I can't see this on Windows 10 + FF 88, so it might just be you have something weird on your system? Inductiveload—talk/contribs 12:45, 24 May 2021 (UTC)

Billinghurst is using monobook, and there the skin sets line-height as a length value (1.5em). Length values are inherited by the computed value of the parent's line height. So for a block with font-size 127% and line-height 1.5em, with a base font size of 10px, the child will inherit a line-height of

(1.27\times 10px)\times 1.5=19px

. If you then set font-size to 300% or 800% you'll have a glyph of size

12.7px\times 300\%=38.1px

and

10px\times 800\%=101.6px

inside a 19px line-height. When the line in question is forcibly broken you end up with two overlapping line boxes.

This is only an issue in monobook (Vector sets line-height as a unitless number so it inherits properly), and the kind of difference between monobook and Vector that was among the driving factors for developing the new skin (the visual bling-bling too, of course, but monobook is very outdated from a technical perspective too). Xover (talk) 14:40, 24 May 2021 (UTC)

OK, that explains why I couldn't see it on a browser test site - the default for un-logged-in is Vector. At least we don't have a horrible platform thing going on like I thought. Inductiveload—talk/contribs 15:08, 24 May 2021 (UTC)

A little birdie…

Latest comment: 2 years ago5 comments2 people in discussion

…uploaded a new version of File:Birdcraft-1897.djvu back in March. Do you happen recall why and what the changes were? Xover (talk) 11:56, 24 May 2021 (UTC)

@Xover: I think it was a mis-aligned text layer (phab:T219376 perhaps?), so this was a rebuild from JP2. I always forget the chunkedUpload thing uploads the moment you choose the file, rather than allowing to add a comment and submit. Sorry! Inductiveload—talk/contribs 12:02, 24 May 2021 (UTC)

Was it due to a request, or randomly grabbed from a backlog, or…? No rejigging of pages, deleting extraneous pages, etc. beyond just rebuilding the DjVu? Not really important, but I'm looking at some other maintenance around this work and wanted to check for any surrounding factors or context; and the reupload without finding a WhatLinksHere trigger was a loose straw is all. (also, been there done that, etc.; BCU is very much v. 0.0.1 in that sense) Xover (talk) 12:10, 24 May 2021 (UTC)

I think I patrolled something tangentially related to that page (probably via some discussion with RK) and noticed the "needs fixing" tag and the summary of diff. Which I now notice I didn't change to "to be proofread" -_-. Inductiveload—talk/contribs 12:14, 24 May 2021 (UTC)

Ah, that's what I figured. Thanks! Xover (talk) 12:16, 24 May 2021 (UTC)

Completed Texts not Removed from Front Page Template for MC

Latest comment: 2 years ago3 comments2 people in discussion

I just noticed that Sense and Sensibility Volume 1 is completed, but is still on the front page. Is there anyway to remove completed texts from the template for the sprint section of the MC front page template. Languageseeker (talk) 04:11, 22 May 2021 (UTC)

@Languageseeker:

Done. Inductiveload—talk/contribs 14:33, 22 May 2021 (UTC)

Thanks! Languageseeker (talk) 20:45, 25 May 2021 (UTC)

RH template

Latest comment: 2 years ago3 comments2 people in discussion

Hi,

just to let you know..... It looks like there has been a small change in the working of the RH-template. For instance on this page. When I get the browser-window below a certain width, the middle section is displayed over two lines, while in the meantime there is still a lot of space left on both sides of the text "THE LIFE OF W.M.". Does this have anything to do with the changes you recently made in the template? --Dick Bos (talk) 16:28, 25 May 2021 (UTC)

@Dick Bos: hmm yes, good spot. Is it better now? Inductiveload—talk/contribs 16:42, 25 May 2021 (UTC)

Yes! You fixed it. Thank you! --Dick Bos (talk) 16:47, 25 May 2021 (UTC)

Setting Covers for Export

Latest comment: 2 years ago2 comments2 people in discussion

I know that it's possible to manually set the cover of a work for export. In the Index ns, the cover is already set. Would it be possible to use this information to automatically set the cover for export? I think the French do it that way. Languageseeker (talk) 20:53, 25 May 2021 (UTC)

@Languageseeker: Only if you use the header=1 parameter of the page tag, which can be a little bit fragile and so is rarely done at enWS. Also often the index "cover" is the title page, even if there's a decent actual cover. And you may wish to set a different cover to the one in the file, for example if there's a sticker or something on the DjVu. Inductiveload—talk/contribs 23:08, 31 May 2021 (UTC)

May 2021 Monthly Challenge

Latest comment: 2 years ago7 comments3 people in discussion

Can you take a look at what's going on with the May MC? The number of pages for yesterday were 0. Languageseeker (talk) 00:28, 1 June 2021 (UTC)

Lua error on Main Page, cf. WS.S#Error on Main Page. Probably a month-rollover bug when parsing Module:Monthly Challenge daily stats/data/2021-06; and my first guess would be that it's related to the datastructure being zero-indexed vs. datetime months indexing from one. Xover (talk) 11:00, 1 June 2021 (UTC)

@Languageseeker: yep, that's handled now - basically on the first of the month, it "forgot" to update the previous month's data one last time. I'll adjust the script/cronjobs as needed and hopefully it will Just Work (TM) next time round.

@Xover: since I ran the script on the toolforge before seeing the Main Page, I never saw an error. I fixed a couple of div-by-zero bugs for the first of the month though. Do you recall what the error (roughly) was so I can make sure it doesn't happen again if the month data doesn't get updated on time again in future? It's fine if you don't, I'll try to recreate the environment in a sandbox somehow and see if I can make it go bang. Inductiveload—talk/contribs 11:09, 1 June 2021 (UTC)

Lua error in Module:Monthly_Challenge_statistics at line 178: attempt to index field '?' (a nil value). Trace:

1. Module:Monthly_Challenge_statistics:178: ?
2. [C]: in function "gsub"
3. Module:Monthly_Challenge_statistics:166: in function "chunk"
4. mw.lua:525: ?
5. [C]: ?

Xover (talk) 11:33, 1 June 2021 (UTC)

Oh, and main page stats should probably show last month's stats as well as current month, so all the zeroes on the first of the month don't look too pathetic. :) Xover (talk) 11:37, 1 June 2021 (UTC)

Thanks for the info, that's helpful. Not exactly where I though it was going to be, TBH.

Last month stats is also the plan, but we have only had the stats for a few hours! Inductiveload—talk/contribs 11:39, 1 June 2021 (UTC)

@Xover: Last month stats now live and (I think) non-explodey. Modules now seem to not explode when handed a month that doesn't exist. Inductiveload—talk/contribs 17:40, 1 June 2021 (UTC)

Parameter handling (CSS attribute generation).

Latest comment: 2 years ago3 comments1 person in discussion

Following on from the former post... This is my attempt at a specification of what I was trying to do with CSSline, in writing it I found a flaw in my own logic concerning the current version.)

"The CSSline template generates a CSS attribute and value pair, if and only if an actual user supplied value is present and not the same as default value (which will have been specified in a Templatestyle elsewhere.)

The inputs to the CSSline template are :-

The CSS attribute to use.
The user supplied value for the attribute.
The default value for the attribute concerned.

The output of the CSSline template shall be :-

A correctly separated and terminated, CSS attribute and value pair, if and only if the user supplied value is present, and is neither
- the nil or empty string, nor
- the value '@std' ,nor
- identical to the provided 'default' value

'empty' if no user supplied value is provided.
'empty' if a 'nil' or empty string is provided for the user supplied value.
'empty' if a value of @std is given for the user supplied value ( implying use of the standard or default value).
'empty' if the user supplied value, is the same (or expands to the same as) the default value. (NB It seems I had not yet fully coded this..)

"

Is that what you meant by a specfication? ShakespeareFan00 (talk) 19:45, 1 June 2021 (UTC)

A specfication for (left/right/LR/RL) sidenote would be too extensive to post here, would my Userspace be appropriate? ShakespeareFan00 (talk) 19:48, 1 June 2021 (UTC)

Handling CSS attribute generation with a default(@std parameter) parameter...

Latest comment: 2 years ago5 comments2 people in discussion

Not that it matters, because I lost the support of people here to get any kind of review but I'd debugged this:- Template:Right sidenote/sandbox/CSSline

I then reimplemented the sandboxed sidenoted templates to use @std, (making sure I was calling the updated ones. (very big note to self here) and made a test page:- Page:Sandbox.djvu/257

As far as I can tell in Page: namespace the sandbox it is functional, with the same functionaliy as the equivalent live versions, but the sandboxed version doesn't generate inline styles unless needed, with the defaults in a suitable CSS stylesheet. I also reimplemented the "special/_special" handling. It's how I generate the color formatting, by setting up a cotnrived Indexstyle approach!

Not that I am in any position to ask, but a review of the sandboxed versions appreciated. (Comments at Scriptorum noted. What's being done here doesn't affect the outer wrapper currently, so the updated version should be compatible with Dynamic layouts in Mainspace.

I've not re-sandboxed the Outside L, Outside R etc family because those only work in Layout 2, and a fuller spec would be needed to get them working in all layouts. ShakespeareFan00 (talk) 17:04, 1 June 2021 (UTC)

@ShakespeareFan00: Well, I'm still not entirely sure what your exact goal is here (not necessarily a description of what the sandboxes currently do, but what the outcome you are after is: beware the w:XY problem). Sidenotes have lots of issues (and lots of implementations, each with their own quirks) and as was mentioned to you before, it may (or may not) be impossible to hit every possible outcome 100%. Progress towards improving them is slow, but it's happening slowly (it's bound up tightly with the pagenumbering stuff as you well know, so general improvements there are helping too).

As usual, you haven't documented any expectation about what your templates are supposed to do, so I can't really tell what that template does. Moreover, I don't even know that you know that it's doing what you think it is. E.g. parameter 3 isn't even used.

On the technical level, I am unsure what {{Right sidenote/sandbox/CSSline|text-align|{{{align|@std}}}|left}} achieves that something "conventional" like {{#if:{{{align|}}}|text-align:{{{align}}};}} does not.

Noting also that it's nearly always better to put the default (i.e. left) in a TemplateStyles sheet if possible, because if any style ends up inline on an element directly, it'll always win a specificity battle with all other CSS (unless that CSS used !important, which is nearly always a code smell). Sometimes this is what you want, but usually not.

It might be that if you genuinely are running into Mediawiki template limitations (and I am not really sure you are) that a module will allow you to do what you need to. Inductiveload—talk/contribs 17:38, 1 June 2021 (UTC)

(Sigh) It seems pointless to attempt further explanations, or waste my time until someone has actually written a specification.. ~~Good luck with that.~~ ShakespeareFan00 (talk) 19:19, 1 June 2021 (UTC)

In respect of CSSline (see below). ShakespeareFan00 (talk) 19:46, 1 June 2021 (UTC)

@Inductiveload: In respect of the live implementation, it should be the one you implemented in July 2020, because that's the last one I consider "stable." ShakespeareFan00 (talk) 09:41, 2 June 2021 (UTC)

An Apology..

Latest comment: 2 years ago1 comment1 person in discussion

I'd like to apologise.

My rantings over what inevitably turns out to be typing errors or logic errors on my part is not the sort of calm professional attitude expected here.

Thank you having the patience to respond in an entirely calm and helpful manner despite of this, and I hope that you will feel able to responded to more inteligently posed tehchnical queries in the future. ShakespeareFan00 (talk) 18:53, 6 June 2021 (UTC)

Chapter headings

Latest comment: 2 years ago9 comments3 people in discussion

Prompted by a text I was doing minor maintenance on that got away from me, I've started thinking a little about our previous discussions about per-work CSS and chapter headings as a good starting point. Braindump:

We currently have {{h}}/{{heading}} (html hn tags) and {{ch}}/{{chapter heading}} (div with inline styles), neither of which seem easily adaptable to a modern implementation in place and will effectively have to be replaced.

So maybe we create {{styleable chapter heading}} ({{sch}}) to start. It'll wrap its argument in a span with a standard ".ws-chapterheading" class, and be explicitly scoped down to single-line simple headings (hence the span) to avoid amassing of cruft over time. Chapter headings get basic default styling in global CSS, equivalent to display:block, centered, and xl. Per-work CSS and TemplateStyles should have greater specificity and so should be able to override this with no special magic. To provide some flexibility the template accepts a class argument that lets you specify an extra CSS class to add to the span, but enforces that the class name must start with two underscores. We then explicitly reserve such class names for use in per-work CSS.

I suspect that on examination of existing uses of {{h}} and {{ch}} we will find a large proportion of the uses are actually "used on principle" or "used in ignorance" where they're used for the default styling of the respective template. In my experience with other such templates I also hold it likely that a large proportion of the rest will fall into a relatively small number of categories. I'm thinking these may be worthwhile to have predefined global classes for, but that needs careful consideration. If global preset classes are provided, these should be specified in the class parameter in place of local per-work classes.

I'm thinking we keep the template lean and mean, and do all we can to discourage scope creep. And be clear that the template is intended to be semantic first and foremost, and must be used with either a global preset or a per-work CSS.

In any case, if we're happy about {{sch}} we then actively migrate existing uses of the old headers to the new one (which global presets will make much simpler compared to creating bespoke per-work CSS for existing works), and then redirect them. At least as a first approximation we shouldn't have more than one template for this particular purpose. More complex headers can exist, of course, but they should either wrap {{sch}} or roll their own; and all our docs and guidance to new users should nudge them toward this template.

Experience from this process should be a good basis for starting to think about what else we can do in terms of more semantic templates and leveraging per-work CSS.

I'm already thinking that for big brutes like EB1911, creating as-fancy-as-you-please per-work CSS is a reasonable and commensurate burden. But for most works here, and for most contributors, the complexity, skillset required, effort, etc.… do not match up. We need to find ways to provide stuff "for free" to be able to leverage it routinely. Either predefined stylesheets with very common varieties, or some kind of friendly visual editor to add snippets of CSS under the hood but presenting a GUI-ified interface to overriding certain properties ("All headings should be green instead of black"). At a minimum we need to provide boilerplate CSS and guidance. And even then I am presuming that using per-work CSS will be the exception rather than the rule.

In any case, I may throw together {{sch}} to try it out on the poor text that's become my lab rat for several such experiments. Thoughts apt to adjust or confirm course would be very welcome before I squat on another nice short and mnemonic template name. :) Xover (talk) 08:01, 3 June 2021 (UTC)

I take it you're not aware of {{heading}}, a GO3 creating which I've been using for several years. Beeswaxcandle (talk) 08:06, 3 June 2021 (UTC)

@Beeswaxcandle: Yeah, I just flubbed the template link above ({{header}} has a, uhm, slightly different function :)). Part of the goal here is to get away from things like {{heading}} and {{ts}} that put all the formatting inline in the template invocation, and move to a template that just expresses the semantics ("This is a chapter heading") and leaves the formatting to the newly supported per-work style sheets (you may have noticed a new "Style" tab on index pages). Provided we can make it work well and be user friendly it's a much cleaner solution from a technical perspective. It'll also, potentially, let us have much more simple and consistent templates for many features that are common across works but formatted slightly differently. Xover (talk) 08:41, 3 June 2021 (UTC)

@Xover: check out {{plain heading}} (yes, it's a rubbish name, I need to thing of a better way to describe these "classy" templates in a quasi-standardised way).

However, I am pretty sure the semantics are "suboptimal" in that it's currently

<hX id="First_line">
<span>First line</span>
<span>Second line....</span>
</hX>

(NB: div-in-hX is ill-formed HTML). I'm not sure it shouldn't be more like:

<hX id="First_line">
  First line
</hX>
<div class="subtitle">Second line....</div>

Also, it has an annoying tendency to add auto-TOCs. Not sure about the best way to handle that.

Inductiveload—talk/contribs 09:15, 3 June 2021 (UTC)

Using hn is always going to be a fight with core, skins, and any number of Gadgets etc. (for example, {{plain heading}} gets random appearances of "[ link | permalink ]" from w:User:Xover/EasyLinks.js). And the semantics aren't noticeably a good match for chapter titles in a book either (it'd work fine for our more web page-like content, such as docs and policies). That's why I specified span for this above.

And also for the second problem with {{plain heading}}: it's trying to deal with multi-level headings and all the attendant complexity. We don't need to put everything in one template: if we need multiple heading lines, use multiple templates. There's a similar case with the common heading followed by a decorative {{custom rule}}. Or what about that pithy quote following the chapter heading? Or the list-of-subjects-covered-in-this-chapter? Orley Farm is an outlier in that the second line can actually be argued to be a part of one full chapter heading, but most such constructs aren't or are only dubiously so. UNIX philosophy applies: do one thing, do it well, and make sure you play well with all the other doing-one-thing-well templates.

PS. Did I mention… I can guarantee the annoying(ly) part, but whether also right is a completely orthogonal issue. :) Xover (talk) 18:48, 3 June 2021 (UTC)

{{Plain heading}} isn't seeking to be the way to do everything and anything with headings, but the subtitle line seems a common enough pattern to merit a second parameter to me? If a work doesn't suit it, then don't use it and use something more suitable. That's pretty much why I specifically didn't add style, to stop people piling crap into it. As before, I can be convinced to drop the use of hN tags.

Options I can see (for the Orley Farm type, not for a generically complex case) for the calling API (i.e. what an editor writes in Wikitext: the generated tags are roughly the same in each case, mostly modulo where the CSS lives)

Status quo: direct formatting: {{center}} etc. - CSS inline/maybe TemplateStyles
Something like {{plain heading}} (takes heading and maybe subtitle iff suitable) - CSS in index styles
A heading template and a subtitle template - ditto
A heading template and a generic classed template like {{classed div|subtitle|The Content}} - ditto
Two generic classed templates (i.e. {{classed div|chapter_heading|Chapter 42}} and {{classed div|subtitle|The Content}} - ditto
Work-specific templates (overkill for most works) - CSS inline/probably TemplateStyles, maybe index styles

How do you see the calling convention in your ideal world? Inductiveload—talk/contribs 22:47, 3 June 2021 (UTC)

The point I was failing to make is that the world (our works) is arbitrarily complex, so I don't think Orley Farm as a type specimen tells us very much. The pattern of a heading followed by some other text preceding the start of the body text of a chapter is common, yes, but what the contents of those two "lines" is varies remarkably. Compare Repulsing the Eater of the Ass, Propertius Confesses, and Page:Life of Edmond Malone.djvu/57 (all of which I need to finish up, I am reminded). If the template does two of the chapter heading lines, why shouldn't it do three?

My thinking here is that we should go to the smallest atomic unit that we can reasonably do, and leave the combination of them to users for each use case. A chapter heading is then a completely generic construct common to every published book that has chapters, or as near as makes no statistical difference. The same goes for a "chapter subheading". Both can be display:block by default, but an index style can choose to make them inline-block instead if that makes sense.

That block of stuff beneath the chapter heading and possibly a subheading can then be a div based template with a suitable name and aliases (chapter toc? chapter quotation? chapter summary?), and can fit all sorts of stuff that you don't want to try to handle in the heading qua heading.

That gives us three templates (possibly all backed by the same Lua module, of course), all of which are optional, and that can be combined in infinite ways (including non-css-based stuff interspersed between the heading and subheading at need), but will fit almost all works and is easy to teach (as well as can be expected) and thus can get ingrained in muscle memory.

I don't like "classed div", for the same reasons you've previously expressed misgivings about similar, and ditto why I abandoned {{sbs}}: it goes one step too far towards just writing raw HTML. I think we'll probably need both div-, span-, and p-with-a-class templates to deal with special cases, but I don't think these should be recommended and certainly not a model to follow for other templates. Xover (talk) 13:14, 5 June 2021 (UTC)

I understand the sentiment, but I'm not quite clear what your ideal wikitext for, say, Page:Life of Edmond Malone.djvu/57 would be? Could you just brain dump what you think the wikitext should look like and we can fight about something concrete? ^_^

For example the examples above:

Status quo:

{{center|{{x-larger|CHAPTER III.}}}}

{{center|{{smaller|1769–1777.}}}}

{{hi|{{smaller|Law Studies—Irish Duels...}}

His return...

Plain heading (with Index CSS) (BTW, {{plain heading}} as it stands can do this)

{{plain heading|l=2
 |CHAPTER III.
 |1769–1777.
 |Law Studies—Irish Duels...
}}

His return...

"Semantic" templates (with Index CSS)

{{??? heading|CHAPTER III.}}

{{subtitle|1769–1777.}}

{{section content|Law Studies—Irish Duels...}}

His return...

All generic classed templates (with Index CSS)

{{classed div|chap_title|CHAPTER III.}}

{{classed div|chap_subtitle|1769–1777.}}

{{classed div|chap_content|Law Studies—Irish Duels...}}

His return...

Out of these (or something in-between, or something else entirely) how do you envision this in an ideal case? I am cognisant of more complex cases existing, but on the assumption that there always a more complex case, there will always a point at which it'll be easier to just go back to direct formatting, so my thought is to make the 95% cases easy and worry about the 5% cases later. If the 5% can be coerced into a wider framework, then great, if they can't, then don't make the 95% case impractical to accommodate. Inductiveload—talk/contribs 10:00, 7 June 2021 (UTC)

Re We need to find ways to provide stuff "for free" to be able to leverage it routinely. Either predefined stylesheets with very common varieties, or some kind of friendly visual editor to add snippets of CSS under the hood but presenting a GUI-ified interface to overriding certain properties ("All headings should be green instead of black"). At a minimum we need to provide boilerplate CSS and guidance. And even then I am presuming that using per-work CSS will be the exception rather than the rule. This is roughly my feeling. I'm not going to hard or fast on the CSS in general because we need to see how it shakes out when there are better template support. A "GUI-ified" interface would be pretty sweet, but technically rather tricky to keep in line if someone edits the CSS as code. A "wizard" to create snippets would be easy enough, and there is already some CSS help in the WikiEditor which can be built on (wouldn't it be nice if Vue.js could be a thing in time for that...).

I also need to figure out if I can get the "preview page with this CSS function" working on the PHP backend, because editing the CSS blind is really a bummer. Inductiveload—talk/contribs 09:31, 3 June 2021 (UTC)

Per Index styles to be available per Index: page

Latest comment: 2 years ago6 comments2 people in discussion

Hi. If you look at the inserted TOC on Index:The case for women's suffrage.djvu you will see that the the right hand column has not right-aligned (per index style). Seems to not be picking up the style. While is less consequential on this page, some ToCs will have more complicated and reliant formatting, so I was thinking that we need to apply the styles to the mediawiki: index page template. Am I missing a potential downside? — billinghurst sDrewth 04:46, 6 June 2021 (UTC)

Due to implementation details (see my comment at 18 March 10:47pm and Tpt's reply here), you only get auto-styles with the <pagelist> tag (or in page NS). But this tag doesn't work in the Index NS.

Probably, as you say, it makes sense to include the CSS via a TemplateStyles call in MediaWiki:Proofreadpage index template. Inductiveload—talk/contribs 09:43, 7 June 2021 (UTC)

Done not sure where you think that we should drop a note that says it occurs. Probably more people's expectations anyway. — billinghurst sDrewth 12:22, 7 June 2021 (UTC)

@Billinghurst: I tweaked it a bit because not checking CSS existence before invoking makes it throw a red error.

I'll make a note at H:Page styles. Inductiveload—talk/contribs 13:07, 7 June 2021 (UTC)

Duh. Do we want to have it set to only display if Remarks is present too? Hmm, don't suppose it matters as if it fks up we need to know wherever it is. — billinghurst sDrewth 13:20, 7 June 2021 (UTC)

I think that's actually more confusing, as the CSS will apply to the whole page (TS isn't scoped to the HTML element it is declared in, which is actually what allows it work for us at all), so gating it on remarks could be slightly surprising if you add a TOC and suddenly some CSS comes along and makes your pagelist Comic Sans. Also phab:T284449. Inductiveload—talk/contribs 13:28, 7 June 2021 (UTC)

Re: Main namespace files for download or export

Latest comment: 2 years ago7 comments2 people in discussion

Hi. Is this the root page layout needed for download? Please check the lower part of the page. The History of Slavery and the Slave Trade. The TOC links in the header point to the pages in the context of the book layout. Does this has to be done to every main namespace book with a TOC?— Ineuw (talk) 14:05, 26 May 2021 (UTC)

@Ineuw: pretty much, yes. There are a handful ways to get around it in a pinch, but the simplest, most maintainable and most consistent way to achieve this is to put it on the root page. Also, IMO, this is far more user friendly for reading, since you can see the TOC from the landing page, without having to notice the contents link and follow it.

No link in the header is used for export generation, because headers are explicitly excluded from the content for export.

Also, you should never use pixel widths for things with text content. See H:PXWIDTH, but tl;dr you should probably use a text relative size like em. In this case ~30em will look roughly the same in the usual case. Inductiveload—talk/contribs 14:15, 26 May 2021 (UTC)

Got it. I will get back to you with some questions about your last point.— Ineuw (talk) 14:21, 26 May 2021 (UTC)

I am back as promised

I understood the importance using "em" instead of pixels. Are there exceptions? My future specifications will be in "em", but what about the thousands of past uses? Should templates be modified? I am also aware that my personal settings for the browser, and a web page, have no bearing on what others see. My interest is on how to get a casual reader's attention to notice the 4 possible layout options of a main namespace page.

As for placing the table of contents on a "root" page was not clear because it could have meant a Page: ns I needed an example of a work which had a Page: ns Table of Contents that was transcluded to the Main ns. Not one that was added to the Main ns by an editor. — Ineuw (talk) 20:03, 8 June 2021 (UTC)

@Ineuw: The main exception is if the container contains something in pixels, for example an image that you want something to be the same width as. Generally you don't really want to do this because that can be surprisingly large or small depending on the DPI of the device and the text might not be "suitable". For example, say you want a caption to be the same width as an image at 400px. Might make sense at 1em = 16px, but if the font is 48px on a device, but the image is displayed 1:1 on a 1000px screen, the text will be compressed to the middle 40% of the screen, but the font is 3x the relative size compared to the image that you expect. So use with care.

Generally, the TOC should go on the root page even if it's actually of transcluded pages. Essays in Miniature looks like it exports well for example. Inductiveload—talk/contribs 20:28, 8 June 2021 (UTC)

Thanks. Downloading my own, and others' main page works is on my wikisource bucket list.— Ineuw (talk) 20:41, 8 June 2021 (UTC)

Using {{FI}}, my image widths never exceed 500px. This width is set in my vector.css for the Page: namespace so I can have margins surrounding the text, similar to the original. Since, the images are my uploads as well, I see them from birth (.jp2) and aware of the limitations of display. Besides, I found the commons' image viewing tools to be much improved (impressively).

On re-read, you missed the gist of my Main namespace related TOC question. I always transclude the book's TOC to the Main namespace as it appears in the book's original layout. My question was, do I leave the transclusion of my original layout as well? And duplicate the transclusions on the root page? OR, should my original TOC page be deleted when its contents transferred to the main page? — Ineuw (talk) 20:35, 12 June 2021 (UTC)

Images from When We Very Young

Latest comment: 2 years ago18 comments3 people in discussion

So, the images from this books are getting copyvio on Commons. I opened an undelete request there. Since PawełMM has done a great job on many of them. Is it possible to batch localize the image files here? Languageseeker (talk) 01:05, 2 June 2021 (UTC)

I guess either get PawełMM to upload them here, or find a Commons admin to help (I am not a Commons admin, so I can't see the images). Inductiveload—talk/contribs 16:37, 2 June 2021 (UTC)

A Commons admin restored the files so that they could be transferred. However, Xover said that pybot script is broken so the transfer has to be done manually. There are around 121 files to transfer. Any suggestions or is it just get lots of coffee? Ideally, I would like to preserve the entire file history as well. Languageseeker (talk) 20:12, 2 June 2021 (UTC)

@Languageseeker: The batch import is running right now. The tools I know of do not allow for uploading the full file history, it only grabs the most recent revision. If you really care, you could manually upload the original pages over the top and then undo the change. Don't worry about the messy metadata and missing templates, I'll rip through it all with the bot once it's all imported. Inductiveload—talk/contribs 22:20, 2 June 2021 (UTC)

Thanks. You're amazing as always. I'll let you know if PawełMM decides to finish processing the rest. Really glad that these files are not lost. Languageseeker (talk) 23:13, 2 June 2021 (UTC)

You're welcome. Note that the files wouldn't ever be "lost". Commons might be strict with the copyright hammer, but the admins will lend a hand when we need (and we have a few Commons admins here too). Technically, files are never actually removed from the Commons DB, so we can always get them back, even if PawełMM had deleted them already from his computer!

Importing file history as actual revisions isn't technically possible (you could do the file description page, but not the file itself). That's the main reason why FileImporter/FileExporter exists: you need to be operating inside MW and talking to the revision table at a pretty low level to do it.

PS. Languageseeker, the metadata on these uploads is really quite shockingly bad. If you can't do better than this you should not be doing batch uploading at all, neither here nor at Commons. Please reign in your ambitions to be commensurate with your actual abilities, or ask for assistance. Xover (talk) 05:25, 3 June 2021 (UTC)

@Xover: I appreciate the explanation. Sorry about the poor metadata. I see that I mixed up the Author and the Title field in Patytpan. Besides that, how you suggest that I improve the metadata? Languageseeker (talk) 13:51, 3 June 2021 (UTC)

@Languageseeker: You can see an example of a basic file description page for a plate I extracted recently at File:Birdcraft (1903), plate 37.jpg.

The absolute minimum to get right are author, date, source, and licensing. But almost all files should also have a description field filled out with something sensible, and appropriate categories added.

For the source field, for images extracted from a book, you'll usually want to use c:Template:extracted from (or at least a link to the book's file description page). For licensing, on Commons, you need to make sure you are documenting the correct copyright status for both the work's source country and the US (unless the work was first published in the US). Their (and our too, for that matter) licensing templates leave a bit to be desired on the user-friendliness side, but it's still important to get it right. And as you've just learned the hard way, copyright for a book can get really complicated with various people contributing and having separate copyrights.

By the way, it looks like PawełMM is still uploading processed versions of these images at Commons after Inductiveload copied them here. See eg. c:File:Whenwewereveryyo0000unse i2b7 orig 0065.png vs. File:Whenwewereveryyo0000unse i2b7 orig 0065.png. Those are going to get deleted at Commons soon so you may want to coordinate that effort.

Oh, and the book scan from which these images were taken will also need to be either copied locally or redacted at Commons (the images are no less copyvio there just because they're part of the scan rather than in separate files). Xover (talk) 16:41, 3 June 2021 (UTC)

At least it gave me a chance to fix the imagetransfer script. Inductiveload—talk/contribs 23:19, 2 June 2021 (UTC)

Also of peripheral relevance: imagetransfer.py from Commons to enWS. Xover (talk) 07:13, 3 June 2021 (UTC)

PawełMM is absolutely amazing and finished processing all the images. Would you mind running the bot job again? I made a list of all the files to be localized here. There is also new metadata for the images. Languageseeker (talk) 14:09, 4 June 2021 (UTC) @Inductiveload: Sorry to be a bit rude, but I think the clock is ticking on these.Languageseeker (talk) 20:24, 4 June 2021 (UTC)

@Languageseeker: The existing PWB script doesn't actually work for overwriting when the file already exists on Wikisource, I don't have time to write a new script for this tonight, so I suggest just uploading the newer images locally at enWS manually if it has to be done very soon. Inductiveload—talk/contribs 22:22, 4 June 2021 (UTC)

ok, I think they should be fine until tomorrow. Sorry this is causing so much trouble. Languageseeker (talk) 00:12, 5 June 2021 (UTC)

File list updated over the weekend. All files added into the book. So this should be the final list. BTW, this is the first appearance of a certain bear who is always looking for honey. Languageseeker (talk) 12:33, 7 June 2021 (UTC)

Sorry if you've been having a really busy week. Do you think you'll be able to do the script for this in the next few days? Languageseeker (talk) 18:58, 10 June 2021 (UTC)

Urgh, totally slipped my mind, I've been battling another Commons upload bug and also generally faffing with PHP! I'll take a look. Inductiveload—talk/contribs 19:02, 10 June 2021 (UTC)

OK, should be done now (inc. PDF). You can let the Commoners know they can pull the trigger on deletion. Inductiveload—talk/contribs 21:14, 10 June 2021 (UTC)

Thank you so much! Now I don't have to worry about them getting deleted anymore and can begin proofreading. Awesome work! Languageseeker (talk) 12:37, 11 June 2021 (UTC)

Table dimensions don't seem to work with "em"

Latest comment: 2 years ago4 comments2 people in discussion

Hi, I tried to define this table with "em" at it doesn't seem to work. What am I doing wrong?— Ineuw (talk) 00:46, 14 June 2021 (UTC)

Could you please clue me in what I did wrong? — Ineuw (talk) 04:03, 18 June 2021 (UTC)

@Ineuw: You used a width attribute, not a width style. AFAIK, attributes only do px. But you should not use them anyway because they're obsolete and replaced with CSS. Inductiveload—talk/contribs 04:17, 18 June 2021 (UTC)

Thanks for the clarification.— Ineuw (talk) 04:40, 18 June 2021 (UTC)

style tr <-> interplay in css

Latest comment: 2 years ago1 comment1 person in discussion

Okay, how do I get "tr" and "td" formatting styles to work together as we don't have "col". Index:What colonial preference means.djvu/styles.css and Page:What colonial preference means.djvu/11 demonstrate that I haven't mastered what I do when there is an intersect of styles. Do I just have to go back and style inline with the overriding cellular {{ts|td}} ? — billinghurst sDrewth 05:44, 14 June 2021 (UTC)

@import of local files fails for Index: css

Latest comment: 2 years ago5 comments2 people in discussion

Not sure whether this is related to our implementation, sanitised-CSS or something with the css editor. Both of these syntax entries fail

@import url('https://en.wikisource.org/w/index.php?title=Index:My_Life_in_Two_Hemispheres,_volume_2.djvu/styles.css&action=raw');
@import 'Index:My Life in Two Hemispheres, volume 1.djvu/styles.css';

Unrecognized or unsupported rule at line 2 character 1.

I tried to change the page type to CSS, and that just threw up a VIEW page, so running into a permissions thing, so that is not the solution.

Tracked in PhabricatorTask T285173

Found that it is not allowed through sanitised-CSS. For certain works where we have volumes I would think that is going to be horribly burdensome, and lead to errors. Now you may be able to set some code to make this happen that an Index uses another file however, it seems that local @import of another css-sanitised files in the same ns. should not be a significant risk, so I have created the tracked task. — billinghurst sDrewth 03:48, 20 June 2021 (UTC)

Yep, that will be super handy. The index CSS should be able to be a redirect, but since changing content model is an admin-only thing, it's very clunky and unfriendly. Inductiveload—talk/contribs 12:42, 21 June 2021 (UTC)

My Life in Two Hemispheres

Subsidiary question. I have not set the work to max-width: 38em; as that would make for a very long page on screens for no reason. It will still self-size to screen width, so will it be problematic? Feel free to set it in the css if you think that it is generally more beneficial to do so. — billinghurst sDrewth 04:04, 20 June 2021 (UTC)

Addendum: How in or out is <blockquote> per My Life in Two Hemispheres/Chapter 31. If it is problematic in its natural form, what code have you been applying for export, etc.? I am still dwelling whether to flick code at it anyway now that I am progressing towards CSS 102. Even 2em: indent is irksome when I do an add-on check of the mobile form. — billinghurst sDrewth 04:40, 20 June 2021 (UTC)

@Billinghurst: There's no need to restrict width where it impacts readability. It's more of an issue when you have a very short chapter name and a number, and 70em between the two. For prose (including your TOC's summary blocks), widths should be strictly left up to the layout. It will "compress" horizontally as needed on export: phab:F34518109. The concern with widths is twofold:

On very wide layouts (Layout 1 + fullscreen monitor + a skin like Monobook): is there a massive amount of whitespace that ruins readability? (Many TOCs fall into this)
On a very narrow layout (Mobile @ ~360px generally, Layout 2/4 @ 36em, some e-readers, visually-impaired users with large fonts): does the content spill off the right margin or generally fall apart? Mostly images are monkey-patched in CSS to avoid this in Mobile and export at least. Generally the places this falls apart are:
- Tables: often there's naff all you can do: they're just wide things that made more sense on a printed page in 1873 than they do on a portrait screen in 2021.
- Multiple columns with something like {{multicol}}, which abuses a table for the purpose. Often this can be done better with {{div col}} or {{flex wrap centre}}, which will degrade to a single-column layout. Especially for side-by-side images. Side-by-side text like parallel texts such as treaties and Loebs and such I do not yet have a good solution that will export in any sane way.

Blockquote is exported just fine as HTML tags. The default MW skins add those annoying grey bars (they don't go to export). The {{quote}} template uses blockquotes but provides a saner default (2em on left and right), and can be overridden with index CSS via the class wst-quote and also custom classes if you need multiple variants within a single Index:. Inductiveload—talk/contribs 13:11, 21 June 2021 (UTC)

An Inquiry Into the Causes and Effects of the Variolæ Vaccinæ

Latest comment: 2 years ago3 comments2 people in discussion

This is now (mostly) validated and transcluded. Could you create the other two images, please? Then the work can be fully proofread, and the other work can be removed to this one. TE(æ)A,ea. (talk) 16:25, 27 June 2021 (UTC)

@TE(æ)A,ea.: I'll try to get it done soon. It does take a little time to clean the images up since they're quite faint and coloured even more faintly, so you can't just smash them with background erase. Inductiveload—talk/contribs 06:17, 28 June 2021 (UTC)

I’m sorry, I didn’t mean to pressure you on this. I just wanted to remind you/ask you about it. TE(æ)A,ea. (talk) 14:39, 28 June 2021 (UTC)

July MC

Latest comment: 2 years ago4 comments2 people in discussion

Would you mind running the July Monthly Challenge, I'm swamped irl. Languageseeker (talk) 01:42, 28 June 2021 (UTC)

@Languageseeker: me too, but I'll sort something out. Poke me if I have forgot something and tweak the source list as you want when it exists. The month change-over point is still manual so expect a small bump on the 1st of the month again. Inductiveload—talk/contribs 06:19, 28 June 2021 (UTC)

OK, the basics are set up (main page, cat and data table). I haven't added much, feel free to chuck in a couple more (but I think not too many, since a couple of series will complete volumes soon) Inductiveload—talk/contribs 08:33, 28 June 2021 (UTC)

@Languageseeker: July MC now live and stats for June generated: 2051 pages. :-D Inductiveload—talk/contribs 02:59, 1 July 2021 (UTC)

Page:A Forest Hymn.djvu/17

Latest comment: 2 years ago2 comments2 people in discussion

This was flagged as misnested.

I've marked it as 'problematic' (and others in the same Index also flagged) because I can't see a clean solution to getting even an approximation of the layout without some considerable hassle.

Do you have a suggestion on how it might be done? ShakespeareFan00 (talk) 15:10, 4 July 2021 (UTC)

@ShakespeareFan00: I don't really, no. This formatting is extremely hard to replicate in HTML/CSS and I have never thought of a good way to robustly implement it. All our templates like {{overfloat image}} and so on a thoroughly broken when it comes to mobile and exporting. Inductiveload—talk/contribs 15:58, 4 July 2021 (UTC)

OCR visibility on changes feedback

Latest comment: 2 years ago3 comments2 people in discussion

Hello, would you be open to doing a user session to give feedback on OCR changes? It could entail a phone or video call with some questions on my end. Thanks for your proactive communications. Take care! NRodriguez (WMF) (talk) 21:51, 12 July 2021 (UTC)

@NRodriguez (WMF): Sure, I'd love to. Most afternoons European time work for me, but I might not be around at the end of this week. Inductiveload—talk/contribs 23:02, 12 July 2021 (UTC)

Let me know if afternoon Thursday or Friday would work, or if Monday may be an option. Thank you! Feel free to write me to nrodriguez@wikimedia dot org NRodriguez (WMF) (talk) 18:19, 14 July 2021 (UTC)

Web development kinda sucks…

Latest comment: 2 years ago8 comments2 people in discussion

This breaks at line 40 in Safari due to this. Sigh. Xover (talk) 18:19, 21 July 2021 (UTC)

@Xover: Have you tried using a computer rather than a speak-and-spell with delusions of grandeur? :-D (but oh go on then) Inductiveload—talk/contribs 20:54, 21 July 2021 (UTC)

Don't go there. Just… don't go there. Xover (talk) 21:42, 21 July 2021 (UTC)

And a little bonus weirdness: c:Special:Diff/576257724. Before the change MediaWiki:Gadget-Fill Index.js failed silently; afterwards it fills as normal. I didn't trace it to the root cause, but I'm guessing the regex ends up grabbing "Creator:Kate Douglas Wiggin}}{{Creator:Nora Archibald Smith" and then later bails thinking its gotten garbage data. A non-greedy match might plaster over it short term. --Xover (talk) 22:25, 21 July 2021 (UTC)

@Xover: Nothing so simple, though actually there was a sneaky greedy match too. The actual issue was in the DIY template tokeniser: Special:Diff/11522895. It's dirty, dirty code, but somewhat handy. Perhaps Parsoid will help here one day. Inductiveload—talk/contribs 23:43, 21 July 2021 (UTC)

Oh, I see. Or rather, I don't see, which amounts to the same thing. Yeah, that's pretty yucky; but unavoidable so long as we don't have a proper structured data store and a smart GUI for entering bibliographic metadata. Sigh, one day… I don't suppose you've run across any decent JS lib to access Wikidata yet?

Incidentally, throwing something (anything) into the console whenever you have code that bails out otherwise silently makes these issues much easier to track down. With all the minification and loaders and stuff (every script shows up as "load.php" in the debugger) I'm also getting inclined towards religiously including the script name and other identifying stuff in a comment at the top. Maybe even a convention to start dumping debug logs if there's "debug" in the URL? Because this particular issue aside (custom parsers are always going to be dense), most issues run into in the wild are pretty easily traced in the script logic itself, so most of the effort actually tends to be in peeling away the MW-specific stuff. Hmm. In fact, wouldn't it be neat if the URL param made MW set a wgDebug variable or something… Xover (talk) 08:10, 22 July 2021 (UTC)

Oh, and another thing you might want to do the next time fiddle with Fill index.js: Commons uses c:Template:City to wrap locations for localization purposes. Not a lot of people use it so it doesn't show up much in the wild, but in theory it ought to be used. And when it is, Fill index ends up putting "{{City|New York City}}" in the Index. I'm the only one I've run across that uses it, so definitely not a high priority issue. Xover (talk) 08:16, 22 July 2021 (UTC)

It actually does spew if it chokes on not finding a Book template. The issue is that the borked template parser (it only happened if you had }}{{ in the parameter) reported that the template ended after the Translator parameter. So it looked valid, just really empty.

A better suite of "gadget utils" code would be handy (e.g. modular, levelled debug and consistent "registration"). One for the Infinite List of Infinite Infinities.

I've cut out {{city}}. That said most people don't even move the city to the city field, since the IA dumps it in publisher. Cough.

Re WD, I can't even get them to agree on the schema for their bibliographic data! Best advice so far: do it without asking and if you get far enough it becomes the schema.

Cheers, Inductiveload—talk/contribs 09:25, 22 July 2021 (UTC)

A BIG thanks for your help with the Bodleian Library scans!

Latest comment: 2 years ago2 comments2 people in discussion

A really BIG thanks for your help with the scans. The project is developing quickly, thanks to your quick, positive input! Llywelyn2000 (talk) 09:47, 23 July 2021 (UTC)

@Llywelyn2000: You're welcome! Let me know if I can do anything else. BTW, WS:Scan Lab is now a Thing (TM). Inductiveload—talk/contribs 10:20, 23 July 2021 (UTC)

Internet Archive transfer to Commons

Latest comment: 2 years ago3 comments2 people in discussion

I've been approached by an editor who would like to work on old Welsh ballads (pre 1900). There are c. 1,700 "Welsh+ballads"&page=2 here on IA. You mentioned recently the tool, but I think that it's for single djvu file, rather than batch transfer? If there is a batch transfer tool, please let me know, and if it automatically does OCR, then so much the better! Or maybe I wish too much! Thanks again! Llywelyn2000 (talk) 09:54, 23 July 2021 (UTC)

@Llywelyn2000: Indeed, IA-Upload is really for single files on a manual basis.

For a large bulk import from the IA of this kind, probably the best best is to ask commons:User:Fæ, who has imported over 1 million files with their bot. The bot will only upload the PDF (there is no DjVu at the IA), but it will have the existing IA OCR in it. Since the IA correctly set the language to Welsh, that OCR should be fairly good (example: https://archive.org/stream/wg35-5-98/wg35-5-98_djvu.txt) Inductiveload—talk/contribs 10:16, 23 July 2021 (UTC)

Whow! Thanks Inductiveload! I've worked with Fæ in the past, who is so efficient with his work. I'll pop round and ask him now! Thanks again! Llywelyn2000 (talk) 11:43, 23 July 2021 (UTC)

Old English Soliloquies

Latest comment: 2 years ago3 comments2 people in discussion

Hi Inductiveload. You may remember chatting to me about Old English on Wikisource c. 2 months ago. I've been proofreading this text, Index:King Alfred's Old English version of St. Augustine's Soliloquies - Hargrove - 1902.djvu, and have done the whole main text. I might need some help to continue - I certainly will for validation. Do you know any active users with an interest in Old and Middle English? You mentioned on the Scriptorium that I might be able to suggest this for Monthly Challenge. What do you think? Regards Rho9998 (talk) 22:03, 17 July 2021 (UTC)

@Rho9998: I'm afraid I don't know of anyone off the top of my head.

This work is particularly tricky due to the parallel Latin translation on each page, so good work getting it to the current state. It should be possible to transclude at least the Old English now. You can nominate at Wikisource:Community collaboration/Monthly Challenge/Nominations and see what others think. It's borderline for me due to the difficulty of Old English for most people, but since it's already mostly proofread, you may find interest. Inductiveload—talk/contribs 21:02, 21 July 2021 (UTC)

@Inductiveload: I'm not sure that it would be appropriate for the biography month? Would it include autobiography? Augustine's Soliloquies is sometimes considered the predecessor of the Confessions, which is often considered the first autobiography (in Western tradition anyway). I might argue my case. In the meantime, by the way, I'm proofreading an OE text which uses acute accents on the vowels a and o. Would you be able to add these to the OME special character board SVP? Rho9998 (talk) 11:41, 26 July 2021 (UTC)

Hmm.

Latest comment: 2 years ago6 comments2 people in discussion

Remind me… Why do we keep these in Gadget-Site.css instead of having RL load them whenever the page numbers gadget is loaded? Xover (talk) 09:20, 30 July 2021 (UTC)

@Xover: Was going to say I guess so that the defaults are applied even if there's no JS support, but that makes no sense since those elements are applied by JS. I have no idea. Maybe I'll move them into MediaWiki:Gadget-PageNumbers-core.css one day. Inductiveload—talk/contribs 14:50, 30 July 2021 (UTC)

Also… Much of the actual flakiness of pagenumbers comes from those inserted elements, and hoisting entire blocks around (usually after .ready has already fired). If you come up with any brilliant ideas for how we could get the proper HTML structure in place before pagenumbers.js goes to work I'm all ears. I'm even seriously mulling over whether the community would go for needing to put {{foo start}} and {{foo end}} on every single page (well, or the end one; the start we could jam into {{header}}); or, equally iffy but tempting, getting MW to output what we need directly (can PRP manipulate the whole page when it's being invoked?). After all, we just need a couple of empty containers inside #mw-content-text at parse/render time, and then pagenumbers.js could mostly be reduced to a simple stylesheet-switcher. And if we could rely on the containers being there, I think we could even split the page numbers stuff from the dynamic layouts stuff. Xover (talk) 16:50, 30 July 2021 (UTC)

@Xover: it’s possible (possible) that this could be done by the Wikisource extension (not PRP, as we really want control of all pages even if they don't use <pages/>) which could construct all that crap server-side. It is indeed a longer term goal of mine to move the layout stuff into that extension for wider reuse and better "performance" (in particular not having the flicker as the JS comes online). But first, I’m trying to generally sort all the junk out so that I can even visualise what is needed for that to happen. Inductiveload—talk/contribs 17:01, 30 July 2021 (UTC)

We'll get some flicker no matter what for users that have something other than the default layout set. But if we just apply a different stylesheet and don't force the browser to modify the DOM we'll get the benefit of all the browser's built-in optimizations for this kind of thing. Hmm. Actually… this gives me an idea. Maybe we don't actually need all those #fooContainers? The page numbers are absolutely positioned anyway so maybe we could just stuff them over in the gutters. That'd (and moving the CSS out of JS) make this a whole lot cleaner. Maybe. Xover (talk) 18:25, 30 July 2021 (UTC)

@Xover: CSS out of JS: FYI as of this morning, the CSS has indeed been moved out of the JS for exactly that reason. Now, the gadget just sets a class dynlayout-{id} on a top element and leaves the rest to the browser.

And, in theory, if the Wikisource extension is administering the layouts rather than gadgets, it's possible for layouts (and the CSS) to be served in-place according to a user option. Not sure it's worth the effort, but it's not unthinkable. Inductiveload—talk/contribs 18:32, 30 July 2021 (UTC)

Porcine lipstick

Latest comment: 2 years ago4 comments2 people in discussion

aka. Module:Table style. Test cases very welcome at Template:Table style/testcases. Trigger for finally getting off my posterior on this: 11544324. The performance improvements should actually be pretty massive (even with the horribly inefficient table copy that's repeated on every invocation), and finally empty out Category:Pages where node count is exceeded. Xover (talk) 16:40, 30 July 2021 (UTC)

That's a clear example of where TemplateStyles provides a major win.

If I had to do that table, I’d use {{TOC begin}} and co, since they delegate nearly all of their formatting to TemplateStyles via classes on each row of the table. Alternatively, an index- or page-local CSS can be set up and that "raw" table targeted with a class like ._mayan_toc.

Repeated use of either {{ts}} or style= is a code smell now we have TemplateStyles. {{ts}} in particular has gotten way out of hand since it was written way back when. Inductiveload—talk/contribs 17:07, 30 July 2021 (UTC)

Oh, yes, it’s definitely lipstick on the pig. This is just a stop-gap until we come up with a plausible alternative and to make existing pathological uses not blow up. Xover (talk) 18:05, 30 July 2021 (UTC)

Oh, and here are some numbers illustrating the point:

Variable	Template	Lua	Limit
CPU time usage	7.862 seconds	3.486 seconds	N/A
Real time usage	7.898 seconds	3.549 seconds	N/A
Preprocessor visited node count	1,001,301	32,601	1,000,000
Post-expand include size	684,391	384,556	2,097,152 bytes
Template argument size	103,184	24,298	2,097,152 bytes

That's on one of the pathological pages that currently blow up (due to exceeding the node count limit), just by calling the sandbox version instead. --Xover (talk) 21:23, 30 July 2021 (UTC)

MC stats borked again

Latest comment: 2 years ago5 comments2 people in discussion

cf. Talk:Main Page#Lua error on Main page. Xover (talk) 06:25, 1 August 2021 (UTC)

Band aid applied. Xover (talk) 06:37, 1 August 2021 (UTC)

@Xover: Darn, seems it always finds a way to fall over on me, and it's always at one of the weekends when I'm not allowed to be up at 1am to watch it! The stats seem to have been updated at 00:00 by the cron job, not sure why it didn't propagate to the main page - perhaps a purge would have been sufficient? Maybe I'll add a pre-emptive purge to the update script.

All that said, the future PRP Lua stuff may (may!) make some of this tedious bottery obsolete. I hope!

Thanks for the fix in the meantime, and adds a nice extra layer of defence in depth to avoid racing. Inductiveload—talk/contribs 11:38, 1 August 2021 (UTC)

The midnight page creation created an empty data structure (the actual data was added this afternoon), and the code consuming it assumed it'd contain at least one entry (it tried to dereference it). The check I put in is a brute force bail early, so it's possible you can get more nuanced results by moving the check later in the flow. e.g. I'm not sure what the "Current sprint" is supposed to contain, but there might be a more graceful failure than "No sprint found". Xover (talk) 11:54, 1 August 2021 (UTC)

The sprint is supposed to be a "sub-month" focus, but I'm pretty sure no one cares about it, we don't really have the traffic to be able to direct energy like that. Inductiveload—talk/contribs 21:30, 1 August 2021 (UTC)

Gilding the lily

Latest comment: 2 years ago6 comments2 people in discussion

Not that it's really needed, but since I happened to be poking about in the MC stuff today…

If you set the cover images on e.g. Wikisource:Community collaboration/Monthly Challenge/August 2021 to be as wide as the item container (currently 15em) they will be larger, more legible, and visually more impactful with no particular downside. The margin between the image and the item-box border just isn't needed. Xover (talk) 07:03, 1 August 2021 (UTC)

@Xover: there's (kind of) a reason for that. The images are actually served at a fixed height (x300px), so they don't always have consistent widths. This is done for more consistency with neighbours in a row. It's not perfect, but I think it looks "OK". Adding a bit of padding hides the visual effect of some images hitting the edge and some not.

Feel free to tweak it as you like, though! Inductiveload—talk/contribs 21:29, 1 August 2021 (UTC)

Uhm. They may be requested with fixed height, but they're rendered with fixed width and variable height. But, in any case, it was just a quick "it'd look prettier that way" that struck me when I was in really looking at the MC for the first time. The MC stuff is really rather uncommonly slick for enWS to begin with, so further tweaks aren't exactly pressing. But if you ever go back tweaking it you might want to keep the option in mind. Xover (talk) 08:14, 2 August 2021 (UTC)

@Xover: Hmm, yeah. It's been a while. To be clear: are you suggesting something like this: phab:F34573533? Inductiveload—talk/contribs 08:22, 2 August 2021 (UTC)

Yep. Xover (talk) 08:34, 2 August 2021 (UTC)

That actually does look nicer, and I just realised border-radius works on <img>s. I'll put it on the list (but feel free to get impatient and hack it up as you wish: Template:MC-Section/styles.css). Thanks for the idea! Inductiveload—talk/contribs 08:36, 2 August 2021 (UTC)

Old works in this month’s Monthly Challenge

Latest comment: 2 years ago6 comments2 people in discussion

The progress bars for the 3-month works don’t show up, because the system doesn’t recognize works that old (I think). Anyway, they should be removed; could you comment them out from the module? I don’t want to comment out anything important on accident. TE(æ)A,ea. (talk) 19:33, 1 August 2021 (UTC)

@TE(æ)A,ea.: it's actually because they're not categorised as Category:Monthly Challenge (August 2021). I had to rush it off last night and this morning (I had thought it was the 30th!), so I didn't want to trash it without looking. I'll take a look at it now.

We're also a bit short on new works this month - feel free to chuck some more in quickly. 21:18, 1 August 2021 (UTC)

That makes sense, although it is a strange limitation. Perhaps some short works from here? Some of the listings up now look good, but it’s better to work on items with more general support. (Only the three listed with “(transcription project)” instead of “(external scan)” were chosen as works for that project.) TE(æ)A,ea. (talk) 21:23, 1 August 2021 (UTC)

It's just how the bot works: it uses the category members as a data source. Progress is being made towards being able to render progress bars without a bot, but for now, it is what it is.
I didn't have time to do a decent selection. I'll go back though the noms pages and your list, but many of them don't have indexes and I didn't have time to set them up. I kind of dropped the ball this month on being ready for change-over. Sorry!
If you set up any suitable-looking indexes, just chuck them in. Inductiveload—talk/contribs 21:27, 1 August 2021 (UTC)

Thanks for creating the Letters. However, the text layer is offset. Could you fix that, please? (They are set one behind: the text for the title page is found on the page facing the title page, and so on.) TE(æ)A,ea. (talk) 01:56, 2 August 2021 (UTC)
- @TE(æ)A,ea.: Urgh, phab:T268246 and friends strike again. Re-encoded now: Index:Letters from a farmer in Pennsylvania - Dickinson - 1768.djvu. Inductiveload—talk/contribs 09:16, 2 August 2021 (UTC)
  - Thank you! TE(æ)A,ea. (talk) 14:11, 2 August 2021 (UTC)

Ambox vs. ombox

Latest comment: 2 years ago4 comments2 people in discussion

Ambox has WP-specific stuff along for the ride, so I’ve been deliberately migrating things to ombox. Xover (talk) 22:13, 3 August 2021 (UTC)

@Xover: Oh right. I just noticed that ambox gets excluded from the dynamic layouts and ombox does not (makes sense since {{missing image}} is an ombox. Probably the docs for ombox need an update since they currently have a prohibition on use in article space. And the notice templates either need an interposer template or the various classes applied (noexport, layout exempt and probably noprint). Or maybe just make ambox that interposer? Inductiveload—talk/contribs 22:22, 3 August 2021 (UTC)

Actually on closer inspection, aren't {{ambox}} and {{ombox}} just invoking the same module? Inductiveload—talk/contribs 22:57, 3 August 2021 (UTC)

It's been a while and I'm insufficiently caffeinated so caveat brainfog… As I recall it's the same module but ambox is very specifically for "Wikipedia Article-space", and not for "ns:0". Mainly, IIRC, it's got some WP-specific maint. cats. There's some changes that I was hoping to get done upstream, but it ran aground on complete lack of interest from the enWP folks. And since it required grokking metatables and custom methods, sandboxing a patch for it ended up getting dumped on my todo list. In any case, all of the box types are basically interchangeable, except that ambox has enwp baggage and the multibox (whose name I can't recall ottomh) has a bit too much logic that we probably don't want most of the time (I can't recall if the logic was enWP-specific or not). Xover (talk) 06:11, 4 August 2021 (UTC)

Overeager layouts

Latest comment: 2 years ago3 comments2 people in discussion

I think we need to back out this, because now dynamic layouts are active on redirects, versions and translations, and disambiguation pages. I don't really see any workable alternative to having it trigger off the presence of PRP: we could have a suppress flag emitted by {{versions}} and friends, but that would require a whole infrastructure around tagging redirects and otherwise invite playing whack-a-mole with edge cases and exceptions. Xover (talk) 06:44, 6 August 2021 (UTC)

I actually had most of that ready locally in my experimental Vue-based layouts code, so I ported it back. Redirects are easy (they have a flag in mw.config). And the main "special non-content headers" can have classes to disable layouts. I was debating using .subNote, but that's a pretty non-obvious heuristic so I went with an explicit class on header after all.

There might be a few edge cases in content where the layouts look bad but:

That actually is a red flag that the page will be a hot mess on mobile (exhibit A) — if it looks bad in Layout 2 at 36em, it'll look even worse in a 320px mobile screen.
I think it's more important to allow layouts that enable more comfortable reading for the millions of words of non scan-backed texts that to artificially limit it based on presence of scans (which is almost entirely orthogonal to layouts).
It's pretty easy to either set a default layout or just let the use choose) in the rare edge case. I can't actually think of a valid example that's not actually a symptom of non-responsive layout, though.

Also note that the layouts are now smart enough to not reserve space for page numbers where there aren't any (previously they would reserve 3em on each margin). Inductiveload—talk/contribs 07:19, 6 August 2021 (UTC)

I am convinced. At least until non-hypothetical moles start sticking their head up on a regular basis. :) Xover (talk) 08:10, 6 August 2021 (UTC)

Float left....

Latest comment: 2 years ago2 comments2 people in discussion

You recently updated float-right.

Do you plan to also review {{float left}}, {{float right/s}} {{float left/s}}? etc.?

ShakespeareFan00 (talk) 09:24, 6 August 2021 (UTC)

At some point. BTW, we should also harmonise the parameters - the defaults are different for no good reason. Inductiveload—talk/contribs 12:54, 8 August 2021 (UTC)

King Alfred's Old English Version of St. Augustine's Soliloquies alignments

Latest comment: 2 years ago5 comments2 people in discussion

I have done similar work for the Anglo-Saxon Chronicle; which I tested here (if it helps). TE(æ)A,ea. (talk) 19:01, 11 August 2021 (UTC)

@TE(æ)A,ea.: That is useful, thanks. For a new approach to two-sided texts, check out {{parallel pages sections}} which is a bit fiddly to set the sections up for, but seems fairly effective in action. Inductiveload—talk/contribs 18:57, 12 August 2021 (UTC)

1. for “prefix_l the section prefix for the right side” do you mean prefix_r? 2. Unfortunately, ASC has six side-by-side sections, spread out over two pages. TE(æ)A,ea. (talk) 19:42, 12 August 2021 (UTC)
1) Yes, yes I do. 2) yeah, that's a pretty extreme case indeed. I'm not really sure how to approach it any better than you already have! Inductiveload—talk/contribs 19:44, 12 August 2021 (UTC)
- Thanks for the compliment. I should probably create a one-work template to help transclude the sections, so it doesn’t take up so much space. (Something like <pages index="The Anglo-Saxon Chronicle according to the Several Original Authorities Vol 1 (Original Texts).djvu" include={{{1}}} onlysection={{{2}}} /> for the individual parts.) TE(æ)A,ea. (talk) 20:10, 12 August 2021 (UTC)

MediaWiki:Gadget-BugStatusUpdate.js

Latest comment: 2 years ago5 comments2 people in discussion

Could you update MediaWiki:Gadget-BugStatusUpdate.js with this sandbox?

Extra quality control and other feedback is always welcome, if time permits and interest obtains. Xover (talk) 14:46, 9 August 2021 (UTC)

@Xover:

Done, sorry I missed it for a bit! Inductiveload—talk/contribs 18:55, 12 August 2021 (UTC)

No worries. Thanks. Xover (talk) 06:49, 13 August 2021 (UTC)

And thanks for fixing it, it was vaguely annoying me for quite a while! Inductiveload—talk/contribs 07:59, 13 August 2021 (UTC)

Well, it was more "got carried away" fixing: what actually broke that is now fixed was just an extra span in the markup that necessitated a selector tweak. But the waters are muddied by the backend on Toolforge sometimes returning 500 errors (which is not fixed) so by the time I figured out what was going on I'd already rewritten the lot. The backend proxies the requests to Phabricator's API because Phab doesn't support JSONP/CORS and requires a manually assigned per-user API token (so we can't talk to Phab directly from web browser JS), and the maintainer of the tool hasn't edited since 2019 (ex-WMF employee that stopped editing when they switched jobs or something). I may try to set up a replacement eventually, if the bitrot gets sufficiently bad, but not right now. Xover (talk) 08:54, 13 August 2021 (UTC)

officious links

Latest comment: 2 years ago2 comments2 people in discussion

I made a request, you pasted a bunch of officious links but there is no person involved between here and there.

You pointed me to a deadend wall.

How was it that they started to do structured data on all the images and didn't think of what the purpose was or asking the wikis that use them? Oh, probably they made a bunch of officious crap links that they could point real people at.

Indexed images do not compare with rgb. You either knew that and wanted to use jargon to "win" some war you are having or you didn't know that and should not be arguing for this pixel image or another. I am not at war with you; we work together.

Now, this is a serious question: Are you the same person who uploaded those beautiful SVG long ago? If you are not, sure, that is okay. If you are pretending to be that person, probably that is "okay" in what is allowed and not allowed, but not "okay" in the moral sense of right and wrong.

But I am done here. If you can know of a person who can author an uploader for the commons, do point that person in my direction. Better to say NO! Then aim a person to a wall of officious crap.

Wiki is an FOD. If in all of these years, you do not know that, I am sorry for that.

VERY SORRY TO HAVE WASTED YOUR TIME!!--RaboKarbakian (talk) 11:53, 19 August 2021 (UTC) (Field of Dreams)

@RaboKarbakian: You didn't actually make any request, you just talking obliquely about generalities. And I didn't paste any links to anything resembling guidelines or rules, so I literally do not know what you are talking about with "officious links".

I am the same person I was 10 years ago. I am not impersonating myself, if you suspect me of usurping someone else's account, please always feel free to consult a Checkuser to allay any concerns.

I think there is not much I can do to explain more about images since you clearly have some deeply-held misconceptions about image data and how the common formats work. Inductiveload—talk/contribs 12:33, 19 August 2021 (UTC)

histogram request

Latest comment: 2 years ago4 comments2 people in discussion

Let's look at the histogram for this image: You want to treat illustrations as photographs, then really do that.

. --RaboKarbakian (talk) 13:28, 19 August 2021 (UTC)

Good example. As you see, there is quite a lot of information content in this image that is not black or white: phab:F34605970. Obviously, the histogram is bimodal due to the large about of pure white and black, but there are actually 285 unique colours in this image (or at least the 320px thumbnail of it), so it cannot be losslessly encoded as a GIF. The "non-black, non-white" content (i.e. the histogram from values 1 to 254) is either the red digit, or the antialiasing around the digits (which is why they look nice and smooth). Inductiveload—talk/contribs 13:39, 19 August 2021 (UTC)

NOTHING can be losslessly encoded as gif! That is just jargony crap! The choice of format should be based on the purpose of the image. What job is the image going to perform. Where does it display and why. And having an ebook maker that turns hogs into light weight niceties for small devices is really a good thing because image format is not a politic, even though, apparently it is being used that way. PNG, SVG, JPG are not team sports. They are image formats. FGS!--RaboKarbakian (talk) 13:49, 19 August 2021 (UTC)

A bitonal image certainly can be losslessly encoded as a GIF (inefficiently: CCITT in a TIFF would be an even better choice depending on purpose), as well as any image that uses fewer than 256 colors, plus, optionally 1 bit transparency.

In this case, PNG is the format of choice for rasterising an SVG, since the ability to have 8-bit transparency makes it look much better. This is one reason why all SVGs rasterised by MediaWiki are PNGs, not GIFs (the other reason is that a 256-slot index is far to0 small to look good for most images that aren't greyscale).

There is already a task (phab:T287854) raised for using JPG rather than PNG on export, which is blocked by an issue in the thumbnailer used by Mediawiki, so please don't have a go at me about it. Inductiveload—talk/contribs 14:01, 19 August 2021 (UTC)

a uploader for wikisource at commons

Latest comment: 2 years ago7 comments3 people in discussion

⇈ The request ⇈

Background, or what caused the request/need

I had a bot tagging my uploads with structured data. "Inception date" which is so meaningless for a publication, but I was annoyed and mostly just justifying my annoyance until I uploaded a photograph that was reprinted in a book. The photo from the late 1800s, the book early 1900s and I thought "THERE! Publication date and Inception date!" An example that made it justified annoyance.

And really, if the structured data on the images is ever going to be used, publication date really does have meaning for the images whose display destination is here.

The outcome of the negotiations regarding the date in the structured data was that I use the book template. The bot would not put an inception date on the images that were using the book template. At commons, I then installed the AC/DC gadget which will add structured data to all of the images in a category. And Ouila! beautiful categories, informative structured data, etc.

Benefits

The book template has two nice things for me. First, I put the scan on wikidata, so simply by putting the Wikidata Qnumber into Wikidata = I get all of the other fields filled in with the exception of description, permission and image page. It provides an image of the scan and a link to the Index, etc. The second thing is the Image page, which will open the scan to the page that the image was found on. See File:Complete Course in Dressmaking-008-a.gif

Of course, I still get screwed if I forget and put the wikidata number on the book template before getting the Index page filled out here....

Other things the uploader could do

If the uploader could put the images into a sensible category also. I think that people are too shy to make sensible categories, or too busy with their book, or too unknowing of the ways of commons, or too filled with recent concensus, but a category structure that matches the main space structure here is simple and sensible with some additional (crap) identification in the upper cat due to the fact that commons might have much more than en.wikisource does.

So, summary (leaving out "Why I don't want to communicate with you, which was also left out of the communique):

first summary of this request

An uploader for sourcerers that uses the {{Book and not {{Information at commons
Structured data befitting source images
Sensible categories

more words summary of this request

Can you write an uploader for sourcerers that uses the {{Book and not {{Information at commons
Can it add Structured data befitting source images
Can it add Sensible categories

The old admin there, by old I mean active there for years, seem to like my cats and subcats (useful for books containing subject matter that could go elsewhere like the fairy tales and the short story books and the mags). So, I recommend that, because I haven't had problems there with it and it does have some sense.

<rant>Also, I read your tutorial for images here. Do you hate wikisource? Do you hate wikisourcers? Have you ever tried the Decompose plug-in? Do you think that they used grey ink when they published. And, sorry, usually I am nicer, but right now, these are my real thoughts, so really sorry. Also, where is the SVG tutorial which you should probably be really good for writing? </rant>--RaboKarbakian (talk) 14:42, 18 August 2021 (UTC)

@RaboKarbakian: Well there's a bit to unpack here. Let me start by saying that I really am not sure what you are trying to ask here, or even if this is a question at all, or just another rant about...something. But Why I don't want to communicate with you and Do you hate wikisource? Do you hate wikisourcers? leads me to wonder if a constructive dialogue would emerge even if I could understand the above, but let's see.

I have made a tool for "semi-batch" uploading of images for Wikisource: https://ws-image-uploader.toolforge.org/ It currently uses {{information}} rather than book. It could really do with more WD integration, which will eventually come along.

If you are talking about SDC, then I am not the person to ask as have no idea what is going on there and no one cared at VPT and the SDC Modelling talk page when I asked about what WSIU should set. When they decide what SDC is for and how to use it then maybe I'll bother. WSIU has a category field, and one day may be able to grab it from the WD "commons category" field for an edition.

If you have a better way to do the images, then feel free to write it up yourself, but I maintain crashing the black point is not very respectful of the images, even if they do then perceptually "pop" more. The books were of course printed in "black" (obviously not perfectly black itself) but there are still variations in the printing - even black and white printing has shades of grey due to the ink, paper and plate texture. Compare the left and right: you have deleted all the small variations in the body and head of the woman:

"Crashed" black point Gif: palette 256-slot colormap + transparent (max 9 bit/pixel). Colormap slots 216-255 empty, which reduces it further. Histogram: phab:F34600168	Contrast stretched and grey point adjusted but (almost) no clipping. PNG, 8-bit GREYA: 8 bits/pixel value + 8 bits/pixel transparency: total: 16 bits/pixel (max) Histogram (pre color-to-alpha): phab:F34600171

As you can see from the histogram, in the right image there is "information" throughout the spectrum from white to black. Now, you can argue that a lot of that variation is actually JPG/JP2 noise in image, and you'd be right. However, I think there's still some amount of original information there and I have tried to preserve it within reason, instead of going for a quasi-bitonal output like yours. There's certainly a spectrum of choices to make here, progressing from "minimum adjustment", which preserves most of the information at the expense of contrast and inclusion of compression noise, right up to past your version where there is only black and white (and pixels are reduced from their original 8 bits to 1 bit of information content). There is no right answer, but my personal preference is for the least destructive method, at least when I do not bother to make a "master" and "display-optimised" variant.

BTW, I do not think the animated version belongs at Wikisource, unless (maybe) as an annotated version.

Also, where is the SVG tutorial which you should probably be really good for writing? Probably somewhere at Commons until WS starts having any works that have SVG images in them.Inductiveload—talk/contribs 16:14, 18 August 2021 (UTC)

The animation is fun. I was careful to avoid rude, as fun has more staying power to it. Some of the images I have worked on make great coloring book images also. And I looked at wikibooks, so, please don't go there with your suggestion. A software solution would be nice, to allow a choice of fun vs. actual (I did not type curmudgeonly). I saw a gif contest once, and someone had made a gif of a line-drawing of a crab walking out of the page it was on, some scientific book. It was so beautiful and cool and it stuck in my mind. <divs can be used for image changes? I can style, but in current company, I am only 'good' at it, and not masterful of it.

There is good reason to maintain "dusty and musty" works as "dusty and musty", but what you are calling clipping I am thinking it is more like ink-bleeding. Where in the printing process, sharp lines become smudges. So, that is our difference there.

SVG:

A recent image that really should be SVG

If this publication had had more images, I might have taken it to the SVGers at commons. As it is, I don't have root access to this computer, which is fine, it is clearly not my os and a warning for me not to share things with just anyone, and Inkscape is not installed here as the intended user just needed GIMP for a facebook portrait. If on my own computer, a tutorial very well might have been used for this image which would very clearly be nicer in SVG.

Who is the person who is representing en.wikisource at the commons then? The inception date meeting was in 2019. The bot owner told me that pasting inception date on all of the images brought out a lot of different dates that would be more useful, publication date being one of them. So, who is the person here, in all officiousness possible, who is representing the needs of en.wikisource?

@Inductiveload:If my earlier request was unclear, please review this ennumerated list. What I am typing now is actually the mishmash you claimed the above to be.

Also, if you think that I am ever using my tools or skills to in anyway humiliate you, or compromise you or in any way offend you, do let me know. I will do anything within my small realm to fix it or make it so it is somehow less offensive to you.

Banter between artists is okay by me. The idea of having a "layout" based solution for fun vs. curmudgeonly is something that is not in my skillset, but wow, wouldn't that be great....

Also, it is not fair to compare histograms of gif vs png. 255 colors and one "not" is not fair for comparison.--RaboKarbakian (talk) 17:00, 18 August 2021 (UTC)

@RaboKarbakian: Nothing in that list is a question, so please be explicit with what you are asking. If you would like me to invent such a tool or system, for the first I will point you to https://ws-image-uploader.toolforge.org/, and for the rest, I'll just defer to Commons' guidelines, if and when they figure them out.

No "one" represents enWS at Commons. The "community" represents the needs of Wikisource, usually via the Scriptorium. If you are after a Wikisource "data union" for collective bargaining for something to be implemented at Commons or Wikidata, you should ask there. Re inception date meeting was in 2019 and the bot owner, and old admin there obviously I am out of a loop or three, because I have no idea who or what you refer to.

You have misunderstood what I mean by "clipping" clipping is where you adjust the levels so aggressively that pixels that used to have values between 1-254 are instead "clipped" to be either 0 (black) or 1 (white). Generally speaking, this represents loss of information and is often undesirable. Some clipping is intentional (for example, I will often "clip" out the brightest 5% or so of the histogram because that's usually JPG noise around edges on a white background and it's a quick and easy way to remove it). But I do try to avoid clipping that substantially alters the image content.

For such a simple image, if you wanted to vectorise it, there are many, many tutorials out there, e.g. https://inkscape.org/doc/tutorials/tracing/tutorial-tracing.html and probably hundreds of YouTube videos. tl;dr trace the bitmap then tidy up with the node editor tool.

The animation might be "fun", but IMO it's not a faithful copy of the original (or even the original intent), it's entirely synthetic.

It's very fair to compare the histograms because it shows you a good representation of the information loss you have paid for you smaller file size (via quantisation into an indexed GIF) with. Yes, the GIF will look pretty rough in the histogram, and there's a good reason for that - the format has introduced compression artifacts.

Inductiveload—talk/contribs 08:30, 19 August 2021 (UTC)

I have undone your incorrect edits to my comment above. The fact that the GIF format is incapable of representing more than 256 colors + transparent is not a defense of it. In fact, it is the reason the format is usually not suitable because it can represent at most 257 values: the 256 colormap entries and "transparent". Compared to an 8-bit GREYA PNG, it throws away 65279 values out of a gamut of 8 bit color + 8 bits transparency = 65536 values. It's even less suitable for an 8-bit RGBA PNG, where the PNG has up to 24 bits per pixel and therefore can store roughly 436 million unique values (at a cost of a filesize roughly 4 times larger, ignoring difference in compression algorithms: 24 bits/pixel vs 9 bits/pixel). This is why you rarely see GIFs these days: the image quality is so bad they are only used when the animation capability is more important. Inductiveload—talk/contribs 13:23, 19 August 2021 (UTC)

Reply to updated request

Can you write an uploader for sourcerers that uses the {{Book and not {{Information at commons: Yes, and I have already done so: https://ws-image-uploader.toolforge.org/; It uses {{information}} which is annoyingly general as a template, but it seems to me that {{book}} is also not necessarily the "correct" template since it still doesn't accurately capture the concept of "image from page X of book", and many of the metadata items copied from the book are not necessarily correct. For example, the illustrator is the author of many images, some anonymous artist and/or engraver did the title page logo and fleurons and so they should be the "author" of the image, even if they are not the "author" of the book.

"Ideally" such a template might be able to refer to the "containing" work via Wikidata, then add concepts such as "page number", illustrator of this image, "what this image is" (e.g. a drop cap) as well as things like captions.

Can it add Structured data befitting source images: Sure, if I can ever figure out what SDC is actually correct. The ball's firmly in Commons' court on that one. I've asked, twice, and received no reply. As you say, it does rather appear that SDC is a solution that no one has actually tried to apply to a problem yet.

I am also unclear if SDC can or should replace the information template. For example, if the caption was in SDC, does it need to be in the info template too? And the page number?

Can it add Sensible categories: Depending on what you mean by sensible, it already attempts to add sane categories based on what the image is (e.g. it will categorise a drop cap according to the letter).

So overall the answer is "already done" and "pending Commons figuring out what best practices for this even are". Inductiveload—talk/contribs 14:13, 19 August 2021 (UTC)

Comment {{Information}} takes other versions, for example other versions = {{other version|filename.djvu|page=nn}}

{{other version|File:The Irish land acts; a short sketch of their history and development.djvu|page=26}}

File:The Irish land acts; a short sketch of their history and development.djvu

— billinghurst sDrewth 01:39, 20 August 2021 (UTC)

F2 for Manual of Trees

Latest comment: 2 years ago3 comments2 people in discussion

I have the means to dig this out. It is for this: https://www.gutenberg.org/ebooks/46450

I smoothed some of it. It had a major formatting problem but reading the whole thing was not painful but more of a clash of my other experiences. I had to quit, so my SR was not very much.

It belongs here. One of the reasons I did not want to dig it out is because it has a bazillion images. Line drawings mostly, if not entirely, of leafs and stems. Very likely, better as SVG than re-doing from the gutenberg project. The gutenberg images could be downloaded but they are going to be small and disappointing, but as place holders for an SVG project with many contributors. That is so much easier to think about.

I would like to hand it to you and then contribute. I don't know how to get it here though and would like to know before I go through the effort to dig it out.

It is a wonderful book.--RaboKarbakian (talk) 16:04, 20 August 2021 (UTC)

@RaboKarbakian: This has been "archived" at PGDP, so there's no way to get the F2 or later text version (even if that didn't upset them). The stated equivalent source is https://archive.org/details/manualoftreesofn00sargrich.

One day I will sort out a streaming HTML parser to deal with the final HTML results from PG which should be even better than the PGDP text. But I'm not there yet.

For the images, your best bet is probably the IA JPGs (or JP2s, even better), because the PG images are small and all have a light grey background for some reason. The images at the IA are in fairly good condition with a fairly even and light paper colour, though there is some bleed-through from the other side of the page.

Inductiveload—talk/contribs 16:53, 20 August 2021 (UTC)

I have the F2. I mentioned this before; months ago. You were uploading F2 here. If you issed them off, and don't want to do this anymore, then okay. But to repeat, I have the F2. so the only real answer (no advice, no suggestions) I need is either that you will put it here or that you wont put it here. Simple enough for even a bot to answer.--RaboKarbakian (talk) 17:08, 20 August 2021 (UTC)

MediaWiki:Proofreadpage index template problems

Latest comment: 2 years ago7 comments2 people in discussion

Could you undo the changes you made to this? It has caused a number of indexes (e.g. this one) to break. TE(æ)A,ea. (talk) 16:41, 17 August 2021 (UTC)

@TE(æ)A,ea.: This index appears to be working fine for me. Perhaps it was a caching issue? If you see another one, does a purge fix it? Inductiveload—talk/contribs 18:06, 17 August 2021 (UTC)

It does now. I found several the other day, but when looking for them I couldn’t find any. TE(æ)A,ea. (talk) 18:50, 17 August 2021 (UTC)

@TE(æ)A,ea.: Well, they do say there are only two hard problems in computing: naming, cache invalidation, and off-by-one-errors. There is definitely a tendency for something in Lua to be sensitive to stale caches. It's not something I've been able to definitively nail down, but certainly updating a module can necessitate a purge to get template docs refreshed sometimes. If I ever get as far as a reproducible example, I'll file a bug on Scribunto. Inductiveload—talk/contribs 18:53, 17 August 2021 (UTC)
- With caches, it’s nearly impossible to have consistent examples. I realized that this also happened when they reintroduced the Score extension: some files didn’t work, or caused problems (especially on transclusion), until the cache was refreshed. (Although, on that topic, there are still some old scores that don’t work with the new extension, like this one; but Score has always hated \header.) TE(æ)A,ea. (talk) 19:05, 17 August 2021 (UTC)

I finally got one here. The page (I think) still displays the error. TE(æ)A,ea. (talk) 17:46, 24 August 2021 (UTC)

@TE(æ)A,ea.: Thanks, confirmed and fixed. It's a funny one because editing the page and saving will fill in the Source field, but somehow at least some of the NIOSH pages are missing it. I have added a default. I think the other fields will be less finicky about being passed nothing.
Also not clear to me why that's not setting off Category:Pages with script errors. Could be to do with the Index page magic, since I don't see any script errors in there at all. Inductiveload—talk/contribs 19:30, 24 August 2021 (UTC)

Please revert the change you made to Polytonic template

Latest comment: 2 years ago2 comments2 people in discussion

Can you please revert the change you made to Polytonic template. The whole point of that template is to render accented Ancient Greek text properly. The change you have made has stopped Ancient Greek text appearing as it should. Was there any consensus about this change??? There was a whole discussion in the Talk page of Polytonic ensuring that the right fonts were included (is this now lost?)

Example Page:EB1911 - Volume 26.djvu/905 uses {{tl:Polytonic}}, this used to show Ancient Greek rendered so that it appeared very close to the printed text, now it does not. Please revert. DivermanAU (talk) 22:36, 26 August 2021 (UTC)

@DivermanAU: The discussion is at the Wikisource:Scriptorium {{greek}} vs. {{polytonic}} vs. lang=el vs. lang=grc CYGNIS INSIGNIS 22:45, 26 August 2021 (UTC)

No _blank please

Latest comment: 2 years ago3 comments2 people in discussion

For a11y reasons, if not for my blood pressure. It's only active in read mode (vs. edit mode) so having to hit the back button is no big deal even in browsers that do not restore form controls. Xover (talk) 14:02, 27 August 2021 (UTC)

@Xover:

Done, sure, if you like. FYI, layouts are available from "submit" mode (i.e. when previewing). Hopefully your browser will save your work in that case (unless you close the help page rather than going back, then it's goneski, at least in Firefox).

But at least layouts are beginning to look a bit less crap, and the coupling between them and page numbers is slowly loosening. Inductiveload—talk/contribs 14:18, 27 August 2021 (UTC)

Yeah, I've noticed the changes flowing by on my watchlist, but haven't had time to look closely yet. I'll give a holler if I can provide any useful input, but meanwhile I'm just ecstatic at any progress there (my own ambitions in that regard have been thwarted by a combination of IRL and an inability to come up with a more perfect way to handle the three-into-one step and other stuff that causes a rerendering). The _blank thing is just a personal hobby-horse.

I was particularly happy to see the end of the ws_msg hack, which IMO should be killed with fire anywhere it still exists on the site. Xover (talk) 14:37, 27 August 2021 (UTC)

Greek vs Polytonic

Latest comment: 2 years ago3 comments2 people in discussion

The Greek template was set up for Modern Greek (el) and Polytonic for Ancient Greek (grc). Has this distinction been removed? --EncycloPetey (talk) 17:02, 28 August 2021 (UTC)

@EncycloPetey: Both Greek and Polytonic set "grc". There was, AFAIK, no specific template for Modern Greek before I created {{Modern Greek}} (aka {{el}}). Inductiveload—talk/contribs 22:29, 28 August 2021 (UTC)

That is odd, as I always understood the distinction had been made for the two. But I can see looking at the early history that it was not the original intent. The history on Wiktionary is so convoluted with moves and changes, I can't tell whether that distinct existed there. There may be some pages here where incorrect advice is given. --EncycloPetey (talk) 22:55, 28 August 2021 (UTC)

Common Sense (Monthly Challenge version) proofread!

Latest comment: 2 years ago4 comments2 people in discussion

The proofreading of this just finished. I noticed you had section markers, so I would like you to transclude along those lines. (I’m also a little busy now, so I can’t work too much on heavier-work stuff like transclusions.) TE(æ)A,ea. (talk) 02:03, 20 August 2021 (UTC)

@TE(æ)A,ea.:

Done sorry, forgot to get around to this. Please have a look and see if it makes sense as transcluded. Inductiveload—talk/contribs 20:33, 30 August 2021 (UTC)

Yes, this looks good. I got to that pirate book: no quote of the “Ballad of Long Ben.” Do you want to transclude What I saw in America, as well? (And, if you’re especially industrious, Letters from a Farmer in Pennsylvania?) TE(æ)A,ea. (talk) 20:39, 30 August 2021 (UTC)

I can do America and Letters, but might or might not be tonight. Thanks for checking Long Ben. ^_^ Inductiveload—talk/contribs 21:16, 30 August 2021 (UTC)

Module:ISO 639

Latest comment: 2 years ago2 comments2 people in discussion

Could you add kos = "Kosraean", nb = "Norwegian Bokmål", sa = "Sanskrit", please? These are the major two missing (plus a new one for an index I am creating now). TE(æ)A,ea. (talk) 01:35, 30 August 2021 (UTC)

@TE(æ)A,ea.:

Done, thanks! Inductiveload—talk/contribs 09:28, 30 August 2021 (UTC)

Table of contents of Popular Science Monthly

Latest comment: 2 years ago2 comments1 person in discussion

Hi. I noticed that the table of contents of the PSM volumes were moved? Do they exist at all?— Ineuw (talk) 05:29, 30 August 2021 (UTC)

Apologies for disturbing, I found it. — Ineuw (talk) 05:46, 30 August 2021 (UTC)

ppoem: right-aligned stanza not worky

Latest comment: 2 years ago6 comments3 people in discussion

cf. Page:The Story of The Other Wise Man (1920).djvu/18

For some reason a (block)right-aligned stanza is getting a computed margin-left:0 and is showing up centered on this page. I'm not sure I can recall this particular thing ever working, so it may have always been broken. Possibly the outer poem block is rendering the inner stanza block alignment moot? Maybe this is a case where the technical guts should rather be exposed to the user in /doc as "Don't do that, align the whole poem instead"?

I'll dig further at some point (head's not in the right space for it just now), and it's a problem that'll keep just fine (no hurry), but figured I'd drop a note for tracking / aid to recall. Xover (talk) 10:08, 29 August 2021 (UTC)

Incidentally, having finally made the effort to put {{sbs}} and friends out of use, I was curious so I checked the transclusions counts for {{ppoem}}. It's now sitting at just over 1k transclusions in Page: and 400 in mainspace. All the ones I've done have been intuitive, done what I expected, and with very few weird interactions with other necessary templates. I've dropped notes here for almost all issues encountered, and I think most of those are either unfixable or should not be fixed. In other words, I think for the stage it's in it is essentially stable. It probably needs more baking looking for weird edge cases before going mainstream without caveats, but we can probably tone down the big dire warnings in the docs.

Biggest outstanding hurdle before full production, as I see it, is making a call on whether going full Extension is worth it at some point. If that's a possibility it should probably happen before flogging it to the variety of contributors (conversion may be slightly painful, and unlearning the habits will be tough for some users); but contrariwise, if we're sure we won't be going that route I think the current version is in excellent shape.

And my experience using it so far suggests it'll be a great boon both to users and technically. For actual poetry, both new uses and replacements for {{sbs}} has been smooth sailing. Xover (talk) 07:30, 2 September 2021 (UTC)

@Xover:Well, I'm glad you're enjoying it. I think pretty much all the worst pinch-points are handled now.

In terms of making it into an extension, I really don't know. The deathly slowness of submitting code to Gerrit in general makes me really not want to get involved if we can avoid it, plus it would take weeks to deploy fixes. The thing about the template as it is is that it is naturally machine-readable (or the module couldn't parse it). This means that there must always be a direct mapping to any equivalent implementation later. So if we do move to an extension one day, it "should be" a straight script to move things over.

The last thing I think we have an issue with is that the drop initials don't export well to Koreader when there's hanging indents (which means they probably will go sideways on other e-readers). Inductiveload—talk/contribs 16:48, 2 September 2021 (UTC)

@Xover: You could try something along these lines:

Who seeks for heaven alone to save his soul,
May keep the path, but will not reach the goal;
While he who walks in love may wander far,
Yet God will bring him where the blessed are.

Londonjackbooks (talk) 11:53, 29 August 2021 (UTC)

@Londonjackbooks: Thanks for the tip! However, in this particular instance I was just noting an issue with an experimental template that Inductiveload has been working on. It's very good at poem formatting, and I'm hoping it'll end up making life much easier for everyone once it's done, so while I'm trying it out I'm giving them feedback on issues encountered (even things that maybe shouldn't work, but that my head thought made sense at the time). Xover (talk) 14:34, 29 August 2021 (UTC)

Ah, gotcha :) Londonjackbooks (talk) 16:22, 29 August 2021 (UTC)

Cover parameter in the Header template

Latest comment: 2 years ago4 comments2 people in discussion

Hello. I have noticed that you added information about a "cover" parameter to the documentation page {{Header/doc}} about 5 months ago. However, I failed to find that parameter in the {{Header}} itself. May I ask about your intentions with this? Just curious. Thanks! --Jan Kameníček (talk) 10:19, 5 September 2021 (UTC)

@Jan.Kamenicek: the microformat data that WS-export uses to set the metadata, of which the cover image is a part, is something that's now handled in Module:Header, rather than Template:Header. Specifically, the microformat is done at line 156 and the categorisation is done at Module:Header#L-290. Hope that helps! Inductiveload—talk/contribs 18:57, 5 September 2021 (UTC)

I am asking because I was experimenting with the new parameter to see what it does, and did not see anything. Following the advice in the documentation page I added cover=Czechoslovak stories.pdf/7 to the header template of Czechoslovak Stories just to see what happens, saved the page, but did not see any change. --Jan Kameníček (talk) 19:08, 5 September 2021 (UTC)

@Jan.Kamenicek: There is no visible output on a normal page, the image is only used for setting the cover of exports like EPUB (and probably MOBI), so when you view it in a "shelf" view on an e-reader it shows the cover rather than just a textual one.

If you look at the HTML source, you'll see something like

<span id="ws-cover">Czechoslovak stories.pdf/7</span>

This is what the export tool reads. Inductiveload—talk/contribs 19:24, 5 September 2021 (UTC)

`{{betacode|ai/nigma}}` - no substitutions allowed

Latest comment: 2 years ago5 comments2 people in discussion

I was going to try going off and figure this out myself, but I think I've convinced myself there's no hope. Please read and see if you agree... ?)

I had noted your template/module {{betacode}} and got around to trying it out, e.g. {{betacode|ai/nigma}}. But since I'm chiefly interested in permanent insertion, I thought I'd try using {{subst: ... }} to generate and insert the converted Greek chars one time. And... boom!

αίνιγμα

If I do save the page anyway, I see "{{#invoke:betacode|decode}}" has been generated and inserted into the page. But then displaying _that_ page will blow up.

If I then change that to "{{#invoke:betacode|decode|ai/nigma}}" I get good text displayed: αίνιγμα But there are still problems.

Testing:

- {{greek|αίνιγμα}} αίνιγμα

- {{#invoke:betacode|decode|ai/nigma}} αίνιγμα

- {{subst:#invoke:betacode|decode|ai/nigma}} αίνιγμα

- {{betacode|ai/nigma}} αίνιγμα

- {{subst:betacode|ai/nigma}}

The first 4 variations all produce good display output, and the fifth blows up.

The third variation does substitute generated output into a saved page. However, the output is the source contents of Template:Greek with parameters expanded inline, 1=αίνιγμα and no param 2. (View this same talk page and find "wst-lang")

So if there's a template->module->template

we can't use subst:template at all, and
subst:#invoke:module substitutes the expanded template source and not just the desired output for display?

(minor: hmm, if there's a redirect from Template:Polytonic to Template:Greek, perhaps 'Greek' should be used in the module's .expandTemplate{title = 'polytonic',} call?)

Shenme (talk) 23:42, 5 September 2021 (UTC)

Oh all this magic goes over my head! But I just saw this:

open Special:ExpandTemplates in a new tab/window

enter {{betacode|ai/nigma}}

click OK

result is just what I'd want:

<templatestyles src="Greek/styles.css"/><span lang="grc" xml:lang="grc" dir="ltr" class="wst-lang wst-lang-grc">αίνιγμα</span>

So... it's possible, for them, but is it possible for mere mortals? After reading things like mw:Help:Substitution#Recursive_substitution I'm feeling very mortal. Shenme (talk) 06:44, 6 September 2021 (UTC)

@Shenme: honestly, I'm not 100% sure how or if subst can work with a module in the mix. I also do not really understand how it works.

However, my conclusion from {{betacode}} was that a template and/or module in this case is a pretty rubbish implementation, and what I am now actually planning (and have started poking half-heartedly at) a "real" IME using the ULS input framework. So then you could access a Betacode Greek IME from the keyboard symbol next to the editor field (F34633878)

And yes, it should short-cut the redirect. Inductiveload—talk/contribs 06:50, 6 September 2021 (UTC)

Ah, um, that's my interest in Betacode, as IME, figuring subst: was taking advantage of your work until I got around to finishing my attempt.

I was going to simply (presumably) do it as a JS widget in user space. I've got the conversion working standalone with no UI. I wanted to output the multi-accented chars as either composed or non-composed, according to inline configuration. A couple other usability variations (strict Betacode vs. loose vs. looser)

Your mentioning ULS is scary sounding. Mebbe we try advancing in parallel, my lash-up homebrew vs. your internals oriented? ;-) Where is the ULS IME hook documentation? Shenme (talk) 07:06, 6 September 2021 (UTC)

(BTW: why no edit links on section titles past a certain leg-wagging section?)

@Shenme: sure, you're welcome to give it a go in wiki-side JS. I just would hate for you to feel put out if a built-in IME came along later.

The ULS way would need to be buit into ULS itself, basically by implementing a something like https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/UniversalLanguageSelector/+/refs/heads/master/lib/jquery.ime/rules/el/el-kbd.js (this is for Modern Greek, which has rather fewer diacritics).

I think I fixed the edit section links: somehow unclosed {{ breaks it. ¯\_(ツ)_/¯ Inductiveload—talk/contribs 07:22, 6 September 2021 (UTC)

Periodicals

Latest comment: 2 years ago7 comments3 people in discussion

I ran across a loose article from The Texas Medical Journal that needed scan-backing and situating in context, and found that the journal was blessedly brief and published over a reasonable timespan, for which there were actually good scans available (i.e. it is actually possible to get it completely proofread if it ends up like an example to point folks at). So I got inspired to set up Portal:The Texas Medical Journal as a sort of way of thinking out loud and experiment about how to deal with periodicals and what our guidance should really be in the area.

Feel free to opine or tweak. My own thoughts are pretty unformed and hand-wavy.

For example, I am thinking this portal is more "WikiProject" than "top-level mainspace page" in function, and so we'll also want a top level The Texas Medical Journal as a landing / navigation page despite there being little natural content for it beyond volume navigation (which is a point on which I believe you disagree mainly due to the duplication?). Each volume of this journal seems to have an index to the volume, so the intermediary pages can be filled with that at The Texas Medical Journal/Volume 18. But I'm dithering on whether to also have an issue level, since each issue as printed has no toc or index but does have a distinguishable title page. Each issue is short enough to host on a single wikipage ala. a chapter, but it may contain multiple distinct articles that would most naturally have their own named page. So either The Texas Medical Journal/Volume 18/Ligation of the Dorsal Vein of the Penis as a Cure for Atonic Impotence or The Texas Medical Journal/Volume 18/Issue 8#Ligation of the Dorsal Vein of the Penis as a Cure for Atonic Impotence. I'm leaning towards the latter.

Got any thoughts on how we set up the (sub)page structure, and content of each level, for this specific case? Is the portal pointless, and should be made into a WikiProject-type page off in the workbench namespaces somewhere?

And, considering we already have one lonely article from this periodical, how would we deal with it now while almost no other content is extant? I proofread one entire (chapter-sized) issue for demonstration purposes, so we have that much that could be transcluded. But it's not much more than that one article we already had (Ligation of the Dorsal Vein of the Penis as a Cure for Atonic Impotence). I would not have been necessarily opposed to a proposal to migrate the extant text to the Index: and delete the mainspace page had one been made. But let's say it was a really seminal article by a famous author, widely cited and widely linked, that we really wanted to keep. How much of the structure do we think this particular case would need to justify having it in mainspace?

I'm currently thinking I'll create the page for the journal with the content from the portal minus the scan links, and a per-volume page for Vol. 18 that transcludes its volume index and has a AuxTOC listing the issues, and a page where I transclude the contents of Issue 8. But I'm ambivalent enough I may change my mind several times before pulling the trigger.

Oh, also, no hurry. I was fixing up that lone article and saw it as a good case to explore the issue, but I haven't been giving it a lot of thought since last it came up and my head is really elsewhere currently so this'll be a "keeping it warm" type thing. Xover (talk) 16:00, 6 September 2021 (UTC)

@Xover: Ha, well, this all sounds very, very familiar! This is essentially the same question that is still open at Wikisource:Scriptorium#Policy_on_substantially_empty_works, to which there was a lot of words and very little in the way of actionable consensus.

Personally, I disagree with shunting away into Portal: or WikiProject space, because that's a major barrier to entry for setting up any article. We should make it possible for people to "slot in" articles as they please and see them presented and findable in mainspace immediately. Or we might as well make periodicals functionally out of scope, because I do not think there is a practical hope of any given periodical ever being complete other than freak occurrences like the PSM). Living the in the real world as we do, it's substantially more likely that someone will create an article or two than will plough though an entire volume or even issue. Especially as we get more periodicals that aren't just text and images in simple layouts. If you told me as a new user I had to proofread an entire issue to get one single article transcluded, I'd get 2 pages into the rest of the volume, lose heart and leave. And I also do not think a "Cantor's dust" structure of an arbitrary number of articles floating around in mainspace untethered to parent pages is a good idea.

I think we should just put them into mainspace right from the off, even if it's literally just a list of volumes and IA/Hathi links (which in itself represents a substantial amount of editor effort, since locating and collating series is quite involved) plus a Wikidata item to catch the authority control. Any other method will immediately result in duplicated content once the list is copied to mainspace (the point this is supposed to happen is unclear to me, somewhere between "one article" and "every article ever"), and the end result will be that one of the pages, likely the Portal, rots. We do not have the editor interest to maintain 2 separate lists of volumes per periodical and keep them in sync. A Portal is a supplement to a work, in my opinion, not a substitute (i.e. it should come after).

The Portal can be very valuable as a thematic index, list of contributors, referencing works, historical contextual works, etc, and, in fact, that's somewhere WS can add mind-blowing amounts of value-add (especially with decent WD support). But it should not, IMO, be the only, or even main, entry to the work. And there should certainly not be a redlink to the work in mainspace, as the number of people who know they should also check the Portal namespace upon seeing a redlink is probably only in the thousands in the world. Furthermore, since WikiProjects are not even in the default search set, I'd say at if you were proposing to set up periodicals as Wikiprojects until "some" future completion threshold, you might as well just leave the periodical at the Internet Archive for all the good the effort, typing and time would do you.

If I was in charge, I'd say:

Create The Texas Medical Journal right now
Populate it with as many volumes + scan/source links as you know of (ideally all, but could even just be the one)
- Ideally, set up all the volumes as a batch import. This would be wildly improved by WD support, but WD do not appear to actually care about using any of their data, so it's on us to find, implement and maintain a suitable schema. So I am not in the mood to do do the work and perform batch imports myself at this time, knowing that I'll have to come back and re-do half of it at some point. One day. Maybe. Plus I CBA to do the pagelists.
Create The Texas Medical Journal/Volume 18 right now and place an AuxTOC, or the actual TOC if you can be bothered to proofread it (which you have done).
- Depending on the work structure, create The Texas Medical Journal/Volume 18/Number 1 if you want to split by number (which I think is good, since it's "as published" and also sidesteps having to deal with "Publisher’s Notes" in every issue by some kind of suffix), but I see that that is not a universal opinion and I just don't care enough to die on that hill.
Create the article itself and link from the TOC, make a WD item, etc

I also did what you have done and set up a "minimal" example periodical as show-and-tell for this discussion: Journal of Classical and Sacred Philology (I didn't create the page, merely co-opted an empty periodical's page). No-one has ever responded to my question on the Scriptorium about what exactly should happen to that work.

Sorry about the text wall, somehow it's impossible to talk about periodicals in less than 500 words. Inductiveload—talk/contribs 17:09, 6 September 2021 (UTC)

Interestingly, I'm right in the middle of setting up Transactions and Proceedings of the New Zealand Institute. I've created the base page and am (slowly) bringing in the Indexes and doing their pagelists. I'll also do the TOC for each issue—so that they are consistent. At the same time I've created Portal:Royal Society of New Zealand to be the holder for the thematic lists. The reason for using the publisher as the Portal rather than the work is that the Transactions is not their only journal. Depending on the size of some themes and what it ends up looking like, may well need sub-portals. I'm seeing the Portal as an effort co-ordination point, but only as links from articles we already host. So, I'll be doing a smattering of articles from the five domains (Zoology, Botany, Geology, Chemistry, Miscellaneous) and some of the Proceedings. The idea being, "I wonder what else is here on that topic; let me click this link; Oh my, they need some help in my area of interest." I'll also populate the author pages with the redlinks out of doing the TOCs.

In terms of structure, I'm fortunate in that the Articles in each volume are numbered. So, Transactions and Proceedings of the New Zealand Institute/Volume 6/Article 2 will lead to the article On Observed Irregularities in the Action of the Compass in Iron Steam Vessels (to pick one at random). However, the Proceedings are not numbered, so various gymnastics are being followed to deal with repeats of names between the sections.

wrt the concept of splitting by number, I agree with doing this. In general, an Index should have a single mainspace place it goes to (the exception being collections of unrelated works).

Just a small wall from me. Beeswaxcandle (talk) 18:12, 6 September 2021 (UTC)

Yeah, I think one of the essential tensions here is between enabling people to dip in and do a single article versus keeping mainspace a pure presentation namespace with none of the dust and mess (scan/index links) of the back rooms (WS:, Index:, Page:). The other being between the desire to have only fully finished works in mainspace versus the reality that most periodicals are far too massive and complicated to make such a requirement viable.

I waffle back and forth on these constantly, so this Texas Med. J. stuff is an attempt to fumble my way to some kind of enlightenment, if only in baby steps. Xover (talk) 18:39, 6 September 2021 (UTC)

Wall of text? Dude, you know who you're talking to here! :)

But, yeah, the Scriptorium discussion that rather predictably (sadly) didn't go anywhere is what I am trying to keep warm. And because my brain hurts when I try to think about the big ball of spaghetti I'm approaching it one little bit at a time. Periodicals. Of a finite and manageable size.

Since we already had an article that I was trying to clean up, I have few compunctions about slapping up otherwise-empty structure around it. It's an improvement however one looks at it. But at the back of my mind is the voice howling that I wouldn't want to keep a single small entry from a dictionary or something like that, and I'd be very annoyed if, say, a large number of such popped up over a relatively short time span (complete with bot-created Not proofread raw OCR pages). There has to be some way to square this circle.

Maybe a separate kind of mainspace page, with separate, visually distinct, {{header}} template, separate policy and style guides? If we carve out periodicals from the dictionaries and encyclopaedias, maybe the problem becomes more tractable? Software support for periodicals, with good Wikidata (or structured data) integration, that makes otherwise-empty structure less offensive to those that swing that way? Xover (talk) 19:09, 6 September 2021 (UTC)

Is there something that coud be leveraged off the Type field on the Index page? Most just default to Book, but these should all be on as Journal (if we want to change the name to Periodical, that's fine). So, if the Type is Journal, could there be some automagic that does {{Periodical header}} etc.? Beeswaxcandle (talk) 19:41, 6 September 2021 (UTC)

[I'm going to mix up a few replies here, sorry]

@Beeswaxcandle:

Transactions and Proceedings of the New Zealand Institute this looks pretty much exactly how I think works well.

Is there something that could be leveraged off the Type field on the Index page Technically, yes, perhaps (10× easier with the PRP Lua patch at Gerrit). However, IMO, it makes more sense to drive this kind of thing via Wikidata. Then automagic is more than possible (and, in fact, I would say the only scalable solution). Modelling such things at Wikidata is pretty much up to us:

So we probably should (as done here by all three of us) start small and totally hammer out a very small handful of exemplar periodicals. Then document the absolute hell out of it and start rolling out to other periodicals.

@Xover:

none of the dust and mess (scan/index links) of the back rooms (WS:, Index:, Page:). I have to say that in find a discrete {{small scan link}} after entries in a volume list to be singularly inoffensive, and if I had to choose between that and maintaining two completely separate venues for the same list, separately by a Portal link that a casual reader will not know about, and differing only in the presence of that little link, I'd choose a single list. We will likely never (barring a general AI with an interest in proofreading) have complete coverage, and at least providing some link to scans is a very useful service since we are the only library who provides that list and allows to expand and correct it. Over time we should work on importing scans so at least it goes {{ext scan link}} → {{Commons link}}, but that will be "easy" if we can hash out WD support.

I'd be very annoyed if, say, a large number of such popped up over a relatively short time span (complete with bot-created Not proofread raw OCR pages) I wouldn't mind if proofread articles popped up all over the place, and OCR dumping is pointless and slightly annoying, but ultimately is contained to the page NS in most cases (though the lack of an actual rule that says you can't transclude red page causes bad feeling when it inevitably leads to a WS:PD showdown. However, if we're going to allow a standalone article in mainspace (and I absolutely think we should, because articles are independent units of work and valuable (or not) in their own rights, rather than as part of the whole) we should also allow the parent pages to join it all up and provide a central anchoring point in mainspace.

visually distinct, {{header}} template, separate policy and style guides: yes * 2.5 (not sure the header needs to be distinct as such: maybe just a banner on the top level saying "periodical incomplete")

If we carve out periodicals from the dictionaries and encyclopaedias, maybe the problem becomes more tractable I think this is sensible. Encyclopedias and dictionaries are their own things and mostly seem happy as they are. Binding them up together will just cause a logjam.

Inductiveload—talk/contribs 20:13, 6 September 2021 (UTC)

Thank you for the tweaks

Latest comment: 2 years ago1 comment1 person in discussion

I just couldn't resolve the image well enough so I couldn't be sure, and your fix was appreciated. Unfortunately, it looks like it's headed for the tip. I hate copyright.

Interestingly, while one can play with the scan image at 1024px as seen during proofreading, you *can't* get commons to display the really large 7462px image from the page "Image" link:

"Error: 500, Internal Server Error at Tue, 07 Sep 2021 10:02:45 GMT"

I noted your use of {{wsp}} as I had just come across it in searching templates. (Because I was looking for {{word spacing}}) There is *so* *much* out there. Found {{Rbstagedir}} and don't even remember where that'd been useful. Templates aren't really categorized well, are they? Shenme (talk) 10:14, 7 September 2021 (UTC)

Merging Two Indexes for Sherlock Holmes

Latest comment: 2 years ago1 comment1 person in discussion

Hello. :) I'm trying to create the indexes for the Sherlock Holmes and I noticed that one of the issues is already present and should be merged into the bi-yearly volumes. Namely Index:The Strand Magazine - Volume 2 number 7 (July 1891).djvu should be merged into Index:The Strand Magazine (Volume 2).djvu with an offset of +1. Languageseeker (talk) 04:36, 9 September 2021 (UTC)

Re: new {{FIS}} top_caption option

Latest comment: 2 years ago2 comments2 people in discussion

Hi. If it's still possible, please change "top_caption" to "top-caption". The other options are using the hyphen, not the underscore. — Ineuw (talk) 09:39, 9 September 2021 (UTC)

@Ineuw:

Done. Inductiveload—talk/contribs 10:07, 9 September 2021 (UTC)

Another Periodical Merge

Latest comment: 2 years ago3 comments2 people in discussion

Index:Mrs. Dalloway in Bond Street.pdf should be merged into Index:The Dial (Volume 75).pdf and deleted. Pages marked in the Index. Many Thanks. Languageseeker (talk) 00:33, 10 September 2021 (UTC)

@Languageseeker

Done. Inductiveload—talk/contribs 08:37, 10 September 2021 (UTC)

Thanks! Languageseeker (talk) 00:20, 12 September 2021 (UTC)

Non-existent template called from Template:Header (Module:Header) for errors in `year` parameter

Latest comment: 2 years ago2 comments2 people in discussion

Notice this page. Works with non-numeric year (e.g. “1220s”) claim to use a template, but I can’t make heads nor tails of Lua. Could you fix this? TE(æ)A,ea. (talk) 22:42, 12 September 2021 (UTC)

@TE(æ)A,ea.: this is not related to the module. It was a 6-year-old typo in {{header/year}} (introduced here). This is a rather complex template that IMO is a perfect example of when a module would be clearer, since the huge nested if statements are a readability disaster area. Inductiveload—talk/contribs 07:12, 13 September 2021 (UTC)

Gonna claim "Great minds…" on this one

Latest comment: 2 years ago3 comments2 people in discussion

cf. 11686853. Module:Header/year. As the diff says it is a "first cut of bone stupid reimplementation of Template:Header/year in Lua", but it gets the mechanical bit out of the way.

And while I am making unconnected comments on discussions flying by my watchlist: Module:Header really should be consolidated into doing everything in-Module instead of leaving the main loop in template code. For example, because this is currently non-functional and (needlessly) hard to implement right when the template controls the entry points.

It is a shortcoming of MW that Scribunto doesn't, ironically enough, have a (html) template facility, but the resulting tradeoff is pretty much a no-brainer as far as I'm concerned. Xover (talk) 11:39, 13 September 2021 (UTC)

@Xover: oooo shiny. ^_^

Certainly there is more work to do with the header template/modules, but now the bulk logic is moved out, I'm hoping to gently iterate towards a modular Lego-kit of utility templates and modules (and template-modules where you can invoke/transclude/require as appropriate with the same APIs), each with simpler APIs and expectations.

RE: in-module templating: you can farm out to sub-templates with expandTemplate, which I think is a pretty handy pattern sometimes, e.g. {{header/main block}} seems more readable that way than as pure Lua-driven mw.html nodes. The problem (not really a problem, just boilerplate) there is that you have to pass the frame around internally or use mw.getCurrentFrame(). Inductiveload—talk/contribs 12:03, 13 September 2021 (UTC)

I disagree on mw.html vs. {{header/main block}}. You certainly can find cases where a expandTemplateed Template: is cleaner than mw.html, but in most cases where you need a html-template type of template a Template: is usually going to be messy (it makes all the wrong tradeoffs). Deeply nested, complex, markup structures are certainly going to look relatively unreadable in mw.html, but I'd say the alternative is a hypothetical future real (html-)template solution for/in Scribunto and not a Template:-based pseudo-html-template (which they aren't really designed for; they're more akin to primitive macros than templates, though having a bit of both in them). Xover (talk) 17:21, 13 September 2021 (UTC)

Batch Upload for The Strand

Latest comment: 2 years ago5 comments2 people in discussion

I created this phab ticket task T290816 for the remaining volumes of The Strand, but Aklapper removed it from your userboard. Would you mind running this? Much appreciated. Languageseeker (talk) 12:22, 12 September 2021 (UTC)

OK, I'm on it now (it'll take quite some time). Note that the license field should actually be the contents of Commons:Template:PD-scan. In this case I think Commons:Template:PD-old-assumed is pretty much the bast we can do up to 1900, and after that we should probably import to WS instead, since otherwise it's a long slog though every known author to ensure PMA 70. Inductiveload—talk/contribs 16:33, 12 September 2021 (UTC)

Thank you for running this. You're probably right. It's best to make all the post-1900 local files. I'm not sure what the license field should be like. If it's not the template, then is it the code of the template?

My basic idea is to create a batch file for the complete run of a periodical once a week in the hope that this will encourage users to use them to create scan-backed texts. Otherwise, it's a lot of work to find and import a 600+ page file for only a few pages. I'm also going to try to merge any periodical fragments that I find into full volumes. Would that be too much for you? I'm not sure how much work it is on your end and I don't want to overwhelm you. Once again, thanks for doing this and sorry about the confusion on Phab. Languageseeker (talk) 23:43, 12 September 2021 (UTC)

@Languageseeker: I don't mind you stacking up tasks, but I probably won't be able to sustain one a week of this size (specifically, with this many Hathi volumes) as they are incredibly slow to download - since I started yesterday, I have 8 volumes downloaded. IA is probably easier, especially if the DJVU already exists.

Don't worry about the Phab confusion, I'm not sure if one is "allowed" to use Phab for on-wiki tasks like this, but its fine by me (and even preferable, since, as well as making it easy to host the data files, a task tracker is, y'know, good at tracking tasks). If it's not allowed, I guess someone will tell me at some point.

If there is going to be a backlog, Phab is certainly easier for me, since otherwise it'll just get lost on a talk page or something.

BTW, if you are using spreadsheet formulae, I'd rather have the XLSX file if possible, since then I can adjust the formulae if I need. ODS is OK, but XLSX is better as the script has an XLSX ingest function, so I'd have to convert an ODS anyway.

For the license field, it should be what goes inside Commons:Template:PD-scan. So, for example, PD-old-assumed.

I'm also going to try to merge any periodical fragments that I find into full volumes. Good idea. I can do page moves easily: i just need the two index names, the relevant offset and the page range in question: User:Inductiveload/Requests/Move pages or indexes. Inductiveload—talk/contribs 09:45, 13 September 2021 (UTC)

Wow, I did not realize that this would be this complicated. I'm definitely going to slow down and wait until one periodical finishes before I request another. Ironically, this makes me even more convinced that this needs to be done preemptively. After all, if it takes a user, an interface administrator, and a WMF teammember several days to upload one volume, how is an ordinary user supposed to do it? Hopefully, seeing the number of volumes that need to be imported server side will convince somebody at the WMF to actually fix uploading. But, who knows? As always, a huge thank you. Languageseeker (talk) 01:34, 14 September 2021 (UTC)

mark-proofread deps fix

Latest comment: 2 years ago2 comments2 people in discussion

Some weird timing thing or upstream change makes MediaWiki:Gadget-mark-proofread.js bomb on mw.api being undefined. Since it doesn't internally armour-plate this, could you set its deps to mediawiki.util,mediawiki.api in MediaWiki:Gadgets-definition?

It's been working fine without it for yonks years so there's an external trigger for why it's started dumping, but I don't have the cycles to actually debug what's happened there just now. Xover (talk) 07:24, 14 September 2021 (UTC)

@Xover:

Done Entropy do be that way. Inductiveload—talk/contribs 12:45, 14 September 2021 (UTC)

Question about how to run a serial

Latest comment: 2 years ago7 comments2 people in discussion

I’m planning to run “The Smart Set” next, but I’m running into a bit of a conundrum. Some of the volumes are available on HT and HT2 scanned by Google , but all the issues are available on IA from microfilm. I see two possibilities. One, we can upload the complete volumes from HT and then combine the individual issues from IA into volumes as well. This will be slower, probably run into the cache bug, but have better image quality. Or, we can just upload all 354 issues. This will be faster, but have worser image quality, and have a gigantic volume listing. The second option would also probably require writing a script that would replicate the way in which the IA create identifiers. Thoughts? Languageseeker (talk) 20:02, 15 September 2021 (UTC)

I think the Hathi volumes are a better bet, at least for the bulk of it. Then fill in gaps as needed. Inductiveload—talk/contribs 20:40, 15 September 2021 (UTC)

Alright, that makes sense. Do you think it is possible to modify the batch upload script to support combining multiple issues into one volume? Something like sim_smart-set_1930-03_86_1;sim_smart-set_1930-03_86_2;..;sim_smart-set_1930-03_86_6 would download the six volumes and combine them into one volume? Languageseeker (talk) 00:57, 16 September 2021 (UTC)

@Languageseeker: Hmm, mayyyyyybeeeeee, but it would be quite a bit of hacking for a relatively rare thing. How many missing volumes are there at Hathi? Inductiveload—talk/contribs 06:02, 16 September 2021 (UTC)

There are 28 volumes missing from the HT. I feel like this won't be the only case because IA is digitizing the microfilms of entire print runs which can fill in gaps from other volume sets. Languageseeker (talk) 12:46, 16 September 2021 (UTC)

Darn, that's quite a few. I'll see if I can make the changes, but it might take a while, so I guess don't count on those volumes being ready imminently! It's obviously going to be a better in general to avoid the IA SIM collection just because the quality is pretty bad (not the IA's fault, just a fact of the medium). That said, the project is pretty cool. Inductiveload—talk/contribs 12:55, 16 September 2021 (UTC)

No worries at all. I know that you have a lot on your plate already. I'll just wait until you get a chance and then I'll create the batch upload request. I agree about the IA SIM collection, but it's a good last resort. Languageseeker (talk) 22:30, 16 September 2021 (UTC)

New Maintenance Category

Latest comment: 2 years ago3 comments2 people in discussion

With the advent of the ProofreadPage Lua library, do you think it's possible to create a page that has the top 10 indexes that have the least amount of unproofread pages remaining (but greater than 0) and which have last been worked on more than one month ago? This could be a good maintenance category for the almost completed works that got abandoned. Languageseeker (talk) 22:30, 16 September 2021 (UTC)

@Languageseeker: This isn't something the Lua library can do for us, really. Right now, it's probably something that would need to be done as a bot, and actually might even need backend API support, since AFAIK the proofreading stats of an index is not presented on the API (yet), Lua is actually ahead of the curve here. It's possible the best way would be to somehow adjust Special:IndexPages to be more useful. For example, Special:IndexPages could learn an excludeZero=1 parameter or similar. Inductiveload—talk/contribs 07:25, 17 September 2021 (UTC)

Yes, I think it would be good to exclude completed indexes from the To Be Proofread and To Be Validated sections. Would it be possible to rank them according to % of pages remaining? Languageseeker (talk) 16:46, 17 September 2021 (UTC)

Latin not italic? :-)

Latest comment: 2 years ago3 comments2 people in discussion

I tried using {{latin|quid ergo dico?}} and was surprised to see that the output was not italicized "quid ergo dico?", though the doc claims "By default, the output of this template is italicised." Was that lost when you switched away from Template:Latin/styles.css ? I was editing the bottom of Page:The New Testament in the original Greek - Introduction and Appendix (1882).pdf/132 where indeed the latin is supposed to be italicized. Shenme (talk) 06:37, 17 September 2021 (UTC)

@Shenme: oooop, that's my fault. I made it default-italic and then realised this is probably incorrect often enough to be awkward and put it back. If all (or nearly all) of the Latin in a work is italic, you can set .wst-latin{ font-style: italic; } in the index CSS. In this case, that is Index:The New Testament in the original Greek - Introduction and Appendix (1882).pdf/styles.css. I will shamefacedly update the {{latin}} docs. Thanks for the note! Inductiveload—talk/contribs 07:06, 17 September 2021 (UTC)

No shame, just second thoughts, and more than likely correct, as I'm working on a text with "quem Deus vult perdere prius dementat" _not_ italicized. Better not to mix formatting and separate function of classification. 's cool. Shenme (talk) 18:23, 17 September 2021 (UTC)

Periodical Merge

Latest comment: 2 years ago3 comments2 people in discussion

Index:Education of the Negro.djvu should be merged into Index:The Atlantic Monthly vol. 69.djvu with an offset of 730, i.e. pg 1 = pg 731. Languageseeker (talk) 02:30, 17 September 2021 (UTC)

@Languageseeker

Done (actually it was an offset of 728 because the first and last two pages were covers). Inductiveload—talk/contribs 07:42, 17 September 2021 (UTC)

Thank you! Languageseeker (talk) 02:35, 20 September 2021 (UTC)

Crawling around (TOC table cells)

Latest comment: 2 years ago4 comments2 people in discussion

At one time, awhile ago, when the engineers needed to retrieve a mainframe cable they knew was unused under the computer center's raised floors, they'd point to a floor square and say "we think one loose end is there" and I'd disappear under the floor and untangle and haul out the very expensive, very heavy cable - 100s of wire pairs - 50/100 or more feet. Dirty and ripped clothes - good reason to stop dressing up for work, yes?

So I'm likening that to templates. ;-) Tell me when you'd like to re-re-re-peek at the {{TOC begin}} family. While {{TOC row 1-1-1}} has vertical-align:bottom on the last "page number" cell, {{TOC row 1-c-1}} doesn't. If it is assumed that 'all' last cells are page number cells, then 'all'...?

But it's not really important right now, compared to all the other stuff you are accomplishing, I think. So later...? (noticed on this page) Shenme (talk) 02:04, 20 September 2021 (UTC)

@Shenme: quite right. I have fixed the template, but you will have to modify your page so all that white-space isn't "sucked into" the last cell of each row. See phab:T232477 and the linked conversation for gory details of why it works this way.

The "right" way is to add a class to the rows and set padding-top on them in the index CSS, but chucking a {{dhr}} into the cell is rough-and-ready. Inductiveload—talk/contribs 07:49, 20 September 2021 (UTC)

Thanks. I'm intrigued by {{optional style}}. I've been thinking that many templates ought to allow a 'style' parameter escape mechanism for unthought-of usages. Mebbe {{optional style}} could be used for that. I'll look at the phab ticket another day, when stomach stronger. Shenme (talk) 04:27, 21 September 2021 (UTC)

Lots of templates do provide a style override ({{optional style}} is just syntactic sugar for a complicated {{#if:}} construction. However, often class is better. In this case, a style parameter won't help much, because a padding has to go on the <TD>, not the <TR>. So you can use Page styles like this too: Special:Diff/11703295 and Special:Diff/11703293. In general, if you find yourself piling custom CSS into a style parameter more than a handful of times, you should be considering if a template or a class would suit better. Inductiveload—talk/contribs 09:12, 21 September 2021 (UTC)

Page Range for MC

Latest comment: 2 years ago6 comments3 people in discussion

It occurs to me that if we begin to run sections of works in the MC, it would make sense to have an option to set the page range. For example, "The Red-Headed League" is part of Index:The Strand Magazine (Volume 2).djvu but only pages 190:203. So, the size of the work is only 14 pages not 666. This way, we won't have erroneous number of pages when running parts of periodicals or multiple excerpts from a work. Do you think this is doable? Languageseeker (talk) 06:02, 13 September 2021 (UTC)

@Languageseeker: It probably is, but it might take a bit of faffing about. I'll give it a go at some point. Inductiveload—talk/contribs

OK, having reflected on this a bit more, it's actually quite hard, because the statistics currently work on a per-index, not per-page level. So it'll need quite a bit of back-end faffery to only record page status changes for the page range of interest. Otherwise if you just limit the page count of the index, you end up with the pages outside the range of interest contributing to the month's stats. So this will probably need to work for the DB-driven change querying that's slowly chugging along (see phab:T172408 and the little constellation of issues around that). Inductiveload—talk/contribs 12:39, 13 September 2021 (UTC)

Aww, ok. That makes sense. I think there are two things that still would be nice and might be possible. 1) To have some way for these selections to appear in the "Under 50" section. Maybe, we can have boolean for Under_50? This way a user will be able to see which texts are the short ones. 2) It's no longer possible to assume that an Index name will be unique, so it might make sense to have a function that would check and remove any duplicates. This should prevent errors in when calculating the total number of pages each month and the total number of pages proofread. I know that there will not be a way to distinguish if a user proofread P220 or P440, but it would probably be best not to count P440 more than once. Languageseeker (talk) 01:30, 14 September 2021 (UTC)

There is such a thing: set short to true to force it to appear as a short work.

For now, you can only have an index in the MC once. I'll see if it can be changed, but it'll take some messing about. Inductiveload—talk/contribs 12:47, 14 September 2021 (UTC)

Languageseeker, Inductiveload: How would this index work for splitting up larger periodicals? The pages of this “index” are taken from the volume of The Dial, so there’s no duplication of that sort. TE(æ)A,ea. (talk) 23:12, 30 September 2021 (UTC)

@TE(æ)A,ea.: Thank you for creating this. Ultimately, I hope to reduce the number of split indexes in favor of entire volumes so I'm not in favor of this approach. My goal right now is to try and figure out how to begin proofreading periodicals. Based on past experience, users are not attracted to Periodical Volume X because nobody knows what's inside and there is quite bit of less interesting material. Right now, I'm testing out featuring individual articles/stories. So after Sherwood Anderson, The Man's Story, I plan to change the title and cover to another article from that issue of The Dial which has contributions from authors such as Thomas Mann, T.S. Eliot, W.B. Yeats, etc. This will also help to create scan-backed copies of short stories and serialized novels.

This month, the focus has been mainly been on just how to get the scans on WS. It took almost the entire month to get the The Strand uploaded due to various cache bugs.

Once we figure out the logistics, I plan to create a suggestion section for periodicals. There will probably be an open-ended section for anything from any periodical and a more restricted one for what do you want from this volume of periodical X. Any thoughts or ideas would be more than welcomed. Languageseeker (talk) 02:42, 1 October 2021 (UTC)

Save load actions

Latest comment: 2 years ago2 comments2 people in discussion

Hello. How exactly is User:Inductiveload/save load actions supposed to work? I thought that after copying the line to my common.js the author link template will be replaced by a common author link when I save a page, but it does not happen in this way. Did I understand it wrong? --Jan Kameníček (talk) 17:09, 25 September 2021 (UTC)

@Jan.Kamenicek your also have to add your desired actions according to the configuration section. I can add more details later, I'm out at the moment. Inductiveload—talk/contribs 18:30, 25 September 2021 (UTC)

Batch removal of " — "

Latest comment: 2 years ago6 comments2 people in discussion

In Index:Rambles in Germany and Italy in 1840, 1842, and 1843 - Volume 1.djvu and Index:Rambles in Germany and Italy in 1840, 1842, and 1843 - Volume 2.djvu, there are a myriad of " — " that should be "—"; "— " that should be "—"; "modem" that should be "modern"; and " ;" that should be ";", is there anyway to do a batch job to go through all the pages and correct these? Languageseeker (talk) 17:31, 30 September 2021 (UTC)

@Languageseeker: Yes, if you want to go though yourself, you're probably best off using w:Wikipedia:AutoWikiBrowser, or, if you could use Pywikibot like this:

pwb.py replace -regex -prefixindex:"Rambles in Germany and Italy in 1840, 1842, and 1843 - Volume 1.djvu" -ns:Page "\s*—\s*" "—" "\bmodem" "modern"

Or use a pairsfile (see mw:Manual:Pywikibot/replace.py for all the many options).

And/or use User:Inductiveload/quick_pwb.py to run the whole thing from a file input.

Note that you should check each edit carefully, be prepared to fix mistakes and also respect the recent changed feed by not slamming it with rapid-fire changes without a bot flag. Inductiveload—talk/contribs 18:23, 30 September 2021 (UTC)

Would you mind doing this with a bot flag? The printer did not insert spaces after em-dashes and it seems to be a by-product of some computer doing it automatically. I'm happy to check over the work, but I really don't think that there will be any actual em-dashes with spaces in the original text. Mary Shelley used hundreds of em-dashes, so it's quite a tedious task to do manually. Also, I doubt that she had a modem in her voiturier. Languageseeker (talk) 18:32, 30 September 2021 (UTC)

Done The edits looked sane as they went past, but please have a check of Special:Contributions/InductiveBot too. About 30 or 40 pages per volume were affected. Inductiveload—talk/contribs 18:43, 30 September 2021 (UTC)

Checked over all of them and there was not a single instance of when there was an actual space before or after an em-dash. Interestingly enough, there was one page where there were spaces before and after a bar|2. This was a nice way to wipe out several hundred mistakes in one go. Some of the pages were even marked as validated. This might be a good scan to run on Index pages that are proofread/validated because I suspect that this will not be the only case. It might make a good clean-up project. What do you think? Easy to run (hopefully), easy to check, and has a high impact. Languageseeker (talk) 20:28, 30 September 2021 (UTC)

@Languageseeker: Well, the script to run is right there ↑.

For less script-y people who also don't want to use AWB, a web front end to PWB that shows you replacements is possible, but it's basically just w:User:Joeytje50/JWB:

	mw.loader.load('//en.wikipedia.org/w/index.php?title=User:Joeytje50/JWB.js/load.js&action=raw&ctype=text/javascript');

AFAIK, that tool works on Page pages now (it didn't used to, but I moaned at the maintainer). I don't really use it because I use User:Inductiveload/quick_pwb.

Setting up a WikiProject Typos (or whatever it's called) is possible, but you'll have to find someone else to run it, as I do not have bandwidth.

Someone else mentioned a similar set of fixes in validated pages at WS:S: Wikisource:Scriptorium/Archives/2021-05#Duplicate words and Wikisource:Scriptorium/Archives/2021-06#Suspected_OCR_errors, which could be food for thought. Inductiveload—talk/contribs 07:30, 1 October 2021 (UTC)

Add Category for October MC

Latest comment: 2 years ago3 comments2 people in discussion

Could you please run your bot and add the October category for the texts in this months MC. Languageseeker (talk) 16:41, 1 October 2021 (UTC)

Done Inductiveload—talk/contribs 18:19, 1 October 2021 (UTC)

Thanks. Languageseeker (talk) 21:44, 1 October 2021 (UTC)

The Strand

Latest comment: 2 years ago3 comments2 people in discussion

The amount of blue links on The Strand makes me happy. However, it seems that there is a little more work to do on The Strand. 1) The pages from Index:The Strand Magazine (Volume 3).djvu need to be migrated to Index:The Strand Magazine (Volume 3).pdf. 2) Do you think it would be possible to finish the remaining volumes from IA? The ones in the second excel file? Thank you for your hard work on it. Languageseeker (talk) 21:32, 30 September 2021 (UTC)

@Languageseeker: Pages migrated and MC data updated. I will work on the remainder of the upload, I haven't forgotten. I have some changes to make to my script first in response to guidance from the server admins. Inductiveload—talk/contribs 21:54, 30 September 2021 (UTC)

Thank you. Unless, I'm mistaken, I think that The Strand is the first periodical with all its volumes that are in the PD on enWS. Languageseeker (talk) 13:45, 8 October 2021 (UTC)

Is it possible to have a bot re-OCR texts for MC

Latest comment: 2 years ago4 comments3 people in discussion

I was wondering if it would be possible to have a bot that would reOCR texts that will be featured in the MC. Most of them have OCR that is more than 10 years old. It would make it much easier to proofread with better OCR. Or would it be possible to do it offline and reupload them? Languageseeker (talk) 22:19, 6 October 2021 (UTC)

Either is possible. Probably updating the file is slightly easier, but it also depends where the OCR came from and if there's red pages in the way. Also the new OCR tool will use a new Tesseract, same as if someone did it offline. It might be more scalable to have a gadget to load the OCR tool output on page create. Inductiveload—talk/contribs 22:38, 6 October 2021 (UTC)

Is writing a gadget feasible? I feel that a lot of users are doing this already and it might save people time to automate this. It would also ease the task of proofreading. Languageseeker (talk) 22:44, 6 October 2021 (UTC)

There is a PWB script that can query phetools or google ocr. Mpaa (talk) 22:56, 6 October 2021 (UTC)

Strand Merge

Latest comment: 2 years ago3 comments2 people in discussion

Can you merge Index:The Strand magazine - No 101 (May 1899).djvu into The Strand Magazine (Volume 17).djvu 1=495. Many thanks. Languageseeker (talk) 13:45, 8 October 2021 (UTC)

Done, I think that's the last of 'em! Inductiveload—talk/contribs 16:29, 8 October 2021 (UTC)

Thanks! Should be! Languageseeker (talk) 03:03, 14 October 2021 (UTC)

Dirty dirty diffs...

Latest comment: 2 years ago6 comments2 people in discussion

Open Page:Treasure Island (1909).djvu/35; enter edit mode; hit "Show changes" without touching the edit field. Do you get a clean (empty) diff, or do the header/footer and pagequality stuff show up in the diff? What happens when you repeat while logged out? Xover (talk) 14:03, 11 October 2021 (UTC)

@Xover I get a clean diff in both cases. Same when logged in as a "clean" user with basically no non-default settings. Inductiveload—talk/contribs 14:07, 11 October 2021 (UTC)

Bleh. Thanks. Something is messing with me, and it's nothing obvious. Xover (talk) 14:10, 11 October 2021 (UTC)

Turn on "Editing -> Show previews without reloading the page" in prefs and retry? Xover (talk) 14:33, 11 October 2021 (UTC)

@Xover urk: phab:F34684500. Inductiveload—talk/contribs 15:06, 11 October 2021 (UTC)

See T292676 (I totally stole your screenshot, man, and there's nothing you can do about it!) Xover (talk) 15:16, 11 October 2021 (UTC)

Lua error in Ppoem

Latest comment: 2 years ago2 comments2 people in discussion

I see this error message "Lua error: bad argument #1 to 'gsub' (string is not UTF-8)." on Page:The Great Gatsby (1925).djvu/7. Any idea what's causing it? Languageseeker (talk) 02:57, 14 October 2021 (UTC)

@Languageseeker: Probably this change. I've made a quick fix that seems to work. Xover (talk) 06:26, 14 October 2021 (UTC)

Thanks! 22:32, 14 October 2021 (UTC)

Copyright Renewals

Latest comment: 2 years ago1 comment1 person in discussion

When looking through the Recent Changes, I see that you're doing some work on Copyright renewal. You probably already know this, but the NYPL is working on a database for copyright renewals with initial release here. Languageseeker (talk) 22:11, 14 October 2021 (UTC)

toc to toc conversion

Latest comment: 2 years ago4 comments2 people in discussion

Did you do this manually? I ask (mostly) because I was going to suggest that mpaabot maybe learn how to do this, but I don't want to be rude if you have plans for your bot to do this.

Also, I ask because, either way, it was really nice and if it was manually accomplished, then it was really^3 nice.--RaboKarbakian (talk) 19:20, 16 October 2021 (UTC)

I have a TemplateScript script which mostly uses regexes to do it. It needs the {{TOC begin}} and {{TOC end}} adding manually and otherwise just makes a "best-effort" that needs manual tidying. The primary help is that {{dtpl}} and the {{TOC row ...}} templates have basically the same argument orders.

				{
					name: 'Dtpl',
					position: 'replace',
					script: function ( editor ) {
						let text = editor.get();

						text = text
							.replace( /\{\{(dtpl|dotted TOC (page )?(line|listing))\|\s*\|\s*\{\{gap\}\}/gi, '{{TOC row 1-dot-1||' )
							.replace( /\{\{(dtpl|dotted TOC (page )?(line|listing))\|\s*\|/gi, '{{TOC row 2dot-1|' )
							.replace( /\{\{(dtpl|dotted TOC (page )?(line|listing))\|/gi, '{{TOC row 1-dot-1|' )
							.replace( /\{\{(TOC page line)\|/i, '{{TOC row 2-1|' );
						editor.set( text );
					},
					editSummary: 'Convert to {{TOC begin}}: as a single table, it\'s more likely to export cleanly'
				}

Maybe one day I'll work out a way to do it more automagically, but today is not that day.

So let's just settle on ^2? But yeah, in general, {{dtpl}} is a siren that lures you onto the rocks of broken exports. {{TOC begin}} and friends are not universally loved by all, but they do map unambiguously onto tables so if we can think of something better, they're easy to swap out later. Inductiveload—talk/contribs 20:52, 16 October 2021 (UTC)

You got it ^2. And about the siren (Circe was a good friend to ole Rabo), dtpl is very person friendly, so, to be able to layout with that and have it converted to something more technically sensible has a very real appeal. You have that script in your RegExp editor gadget thing? Every time I look at that thing, the examples are (or were) "vote for me" templates and I cannot disable it quickly enough. I saw that between two of the toc pages you edited was only about 4 minutes. But the first was, like, an hour and 20 mins. Was that first writing the regexp instructions? If so, that didn't take long and I am impressed with that. That 4 minutes to make the changes work is its own kind of sirening....

My next script was going to benchmark png vs tiff saving. I thought it was GIMP but after using the scanner, I am pretty sure the problem is libpng, as even the scanner was confused with how long it was taking png to save and started its "scanning" progress bar again. Not that I am any great lover of libtiff.... --RaboKarbakian (talk) 02:26, 17 October 2021 (UTC)

The 1hr 20 was realising we needed {{TOC row 2dot-1-1}}, then having dinner then coming back and creating it. I have had the dtpl-killing script for a while.

I do recommend just never using {{dtpl}}, because it's just pretty awful for various reasons. {{TOC row 2dot-1}} and {{TOC row 1-dot-1}} are no more complex to use (except adding a {{TOC begin}}). The argument are basically the same: 1, 2 and maybe 3.

WRT TIFF, it will strongly depend on the compression, if any, you are using in the TIFF (TIFF is a wrapper format, not an image format per se, rather like PDF and DJVU), as well as the compression level used in the PNG. It also depends on whether you care more about speed or filesize. Both LZW and ZIP TIFF modes are pretty speedy, especially as LZW is explicitly designed for high speeds. But since they're not image-specific compressors, you will probably pay for that to some extent in filesize.

For example, File:Goblin_Market_029.tif compresses from TIF=23MB to PNG=16MB with a simple convert in.tif out.png command, but it takes quite a bit of time to do so (~11 seconds for me). pngcrush -brute on that PNG would save some more bytes, but it has not even finished yet. Of course, all options (TIFF/ZIP, TIFF/LZW and all PNG compression levels) are lossless, so there's no difference in the image: it's just a matter of how much you value your CPU time, the Wikimedia's disk space (don't worry, they have a 9-figure budget, they'll be OK) and your own Internet bandwidth. Inductiveload—talk/contribs 20:09, 17 October 2021 (UTC)

Does pdf export work?

Latest comment: 2 years ago2 comments2 people in discussion

Hello. I would like to ask you as a person who knows a lot about exporting: I have just tried to export The Shoemaker's Apron into .pdf using the Download button in the top right corner, but I received only the main page with the title, note and contents. None of the subpages was exported, although I used the TOC begin and TOC row templates which are recommended for exporting books. Is there anything else wrong? --Jan Kameníček (talk) 18:28, 18 October 2021 (UTC)

Weird, I can't see anything clearly wrong, though I seem to recall an issue with a page with a quote in the title before but can't remember if that was the cause in the end. Punted to phab as phab:T293708. :-s Inductiveload—talk/contribs 21:56, 18 October 2021 (UTC)

E_TOO_MUCH_LATIN

Latest comment: 2 years ago3 comments2 people in discussion

If you've any idea what's going on here I'm all ears. Shirley it's not the impressive page count it's choking on. I note Commons is currently doing it's usual stellar job choking on anything bigger than a 2MP JPEG, barely cranking out the file description page thumbnails, so I guess it may be giving PRP some crap data. But that seems like a weird way for PRP to fall down if so. Xover (talk) 18:42, 20 October 2021 (UTC)

@Xover hmm, the WS file page says 0x0px, but it's right at Commons. Smells like some kind of file storage/cache shenanigans rather than ProofreadPage. PRP just pulls the page count from file metadata, looks like a garbage in garbage out situation. I think punting to phab is needed here unless there's a control surface we can wiggle to unstick something. There was a stuck image cache for a DJVU that was updated weeks ago, since magically self-resolved, so something wierd is going on with files, could be related, it could be just more mystery. Inductiveload—talk/contribs 21:23, 20 October 2021 (UTC)

I added it to T192866 as the most likely culprit. Xover (talk) 18:16, 21 October 2021 (UTC)

Module:header

Latest comment: 2 years ago3 comments2 people in discussion

(Didn't know whether to message you here or try at Module talk:Header)

When I mistakenly tried to give a {{header}} a line

| author = [[Author:Horace James|Horace James]]

it freaked saying

"Lua error in Module:Header at line 186: attempt to index local 'target' (a nil value)."

Lines 184-6 are:

      local target = mw.title.makeTitle("Author", args[param])<br />
      -- expensive function!<br />
      if not target.exists then

What would be the Lua way to check if target was null / undefined, before then accessing the member of it?

And how does one give multiple authors to a {{header}}? Shenme (talk) 05:11, 21 October 2021 (UTC)

@Shenme a better error message would make sense here (and I'll do that), but the gist is that author should have been just Horace James. For multiple authors, you can use override_author (and then you do you [[Author:xxxxx]]). Inductiveload—talk/contribs 06:42, 21 October 2021 (UTC)

Hmm, so this was just in a sanity check function that fills Category:Works with non-existent author pages. Since the author comes out all messed up if you do this, I think just adding a defence will do (already done). Inductiveload—talk/contribs 07:34, 21 October 2021 (UTC)

Repurposing Sprint Code for MC

Latest comment: 2 years ago5 comments2 people in discussion

The Sprint idea for the MC has largely died because it always felt a bit artificial. I was wondering if it would be possible to reuse the code to add a label to a work instead; for example, Easy; Old English; Formatting, etc.. In the database for the MC, there would be a parameter "label" which would control the text shown on the book cover. This label would be persistent. Adding a label can make it easier for users to know what's they're getting into. Languageseeker (talk) 15:52, 23 October 2021 (UTC)

@Languageseeker Makes sense. We also ready have some labels. Would we say that the first in the list gets the ribbon? Inductiveload—talk/contribs 19:56, 25 October 2021 (UTC)

I think it would make the most sense to have a separate data field called "label" that would control what is displayed in the ribbon. My intent is to use the ribbon to give users the sense of the difficulty or what needs to be done. For example, Easy, Long-S; Transclusion; Formatting; Images; Challenge. This is a request from a number of users who come from PGDP who say that this helps users on PGDP to select texts. Languageseeker (talk) 19:59, 25 October 2021 (UTC)

@Languageseeker Ok, I'll take a look at adding it. It's indeed a shame to not use that shiny, shiny CSS! Inductiveload—talk/contribs 20:05, 25 October 2021 (UTC)

Indeed, that was some shiny, shiny CSS! :) Languageseeker (talk) 20:10, 25 October 2021 (UTC)

The New Europe

Latest comment: 2 years ago17 comments2 people in discussion

Hello. May I ask for help with some upload? I have downloaded two volumes of the journal The New Europe from HathiTrust and wanted to upload them to WS as they are not elligible for Commons. However, it seems that the Wikisource uploader does not support chunked uploads and so only files under 100MB can be uploaded. I have stored the two volumes at https://drive.google.com/drive/folders/1z-kkizbun9ItpgAw7-rhxdFLqXRdVBW1?usp=sharing . If you want, you can also convert them to djvu, but that is not necessary. It would really help! --Jan Kameníček (talk) 11:11, 25 October 2021 (UTC)

@Jan.Kamenicek in progress. The PDFs apparently will just not upload to Wikisource, though I mistakenly somehow got one up to Commons >_< So I have gone for the DJVUs which come out much smaller anyway due to being bitonal. Inductiveload—talk/contribs 13:41, 25 October 2021 (UTC)

I do apologize, but only now I have noticed that volume 3 has a missing map there. I did not notice it before because the pages of the map are not numbered. I have extracted the map from another file of the same volume (which has some other pages missing and so I did not choose it for upload) and it is available at https://drive.google.com/file/d/1Ig26VPcY34E5zCJBhrwEvHo4S4yRT25A/view?usp=sharing . After page no. 256 there are two empty pages and the map should go either between them or instead of them. Do you think you could add it there? --Jan Kameníček (talk) 13:54, 25 October 2021 (UTC)

@Jan.Kamenicek Sure thing. Anything for v3? Inductiveload—talk/contribs 13:56, 25 October 2021 (UTC)

This is for v3. I will correct the pagelist then. I have not noticed anything wrong in volume 4, I hope I did not overlook anything. --Jan Kameníček (talk) 13:59, 25 October 2021 (UTC)

@Jan.Kamenicek OK, it should be all done now. Inductiveload—talk/contribs 14:44, 25 October 2021 (UTC)

Great! Thanks very much! --Jan Kameníček (talk) 15:04, 25 October 2021 (UTC)

@Jan.Kamenicek any time! I can do the rest of TNE up to 1920 if you like? I just need the rest of the data in this format: https://docs.google.com/spreadsheets/d/14eecp1LxmKqZf0XqZaZIylcEBvaWqBjz Inductiveload—talk/contribs 15:38, 25 October 2021 (UTC)

I will certainly use this offer. I only have to find some time to choose the best copies. It is quite difficult as they are not directly accessible outside the US, so I have to download all of them using a Hathi download helper, which is veeeery sloooow, and only then I can go through them and choose. --Jan Kameníček (talk) 15:59, 25 October 2021 (UTC)

Now I can see that you have uploaded a different copy of volume 3. Unfortunately, this copy is missing a lot of various pages, including title pages of individual issues and the appendix at the end of the volume. The map is missing there too. May I ask to upload the copy from my GDrive, only adding there the map after page 256, please? I am sorry for bothering you again. --Jan Kameníček (talk) 17:53, 25 October 2021 (UTC)

@Jan.Kamenicek oh right, sorry. The PDF can't be uploaded due to the server upload issue, and I can't really convert it from PDF t I'll DjVu easily. Can you just let me know which Hathi ID it is? It's much easier to just start from scratch in that case. Inductiveload—talk/contribs 18:57, 25 October 2021 (UTC)

It should be This one --Jan Kameníček (talk) 19:08, 25 October 2021 (UTC)

And the map from behind pg. 256 which needs to be extracted and added to the previously mentioned copy is form this one. --Jan Kameníček (talk) 19:19, 25 October 2021 (UTC)

@Jan.Kamenicek

Done :-) Inductiveload—talk/contribs 06:26, 26 October 2021 (UTC)

I am really afraid to write another problem here… Unfortunately, there are 2 extra pages in the copy. I have prepared the .pdf copy in my computer very long time ago and did not remember that I removed the pages, but as you downloaded the copy anew, they are there again. After page 225 there is an extra page with some blue piece of paper an then page 225 again. Cay you remove them, please? I do apologize for this neverending story, but it should really be the last thing… I am very sorry. --Jan Kameníček (talk) 06:49, 26 October 2021 (UTC)

@Jan.Kamenicek here comes the boom! File:The New Europe - Volume 3.djvu XD Inductiveload—talk/contribs 07:15, 26 October 2021 (UTC)

Great! Thanks very much for all the effort! --Jan Kameníček (talk) 07:50, 26 October 2021 (UTC)

Reformatting PGDP Index

Latest comment: 2 years ago5 comments2 people in discussion

I'm working on importing The Elizabethan stage (Volume 4).pdf from PGDP. The Index is split and it runs for 116 pages. Do you know if there's some easy way to combine the pages. So 1, 2 - Page 1; 3,4 - Page 2; etc. Maybe a checkbox to just ignore odd or even pages? Languageseeker (talk) 20:29, 25 October 2021 (UTC)

@Languageseeker Hmm, using dp_reformat? I don't think so at the moment. Can you point me to the raw text for testing? Inductiveload—talk/contribs 20:52, 25 October 2021 (UTC)

Raw text is here. Many thanks. Also, a quick find/replace a^t or a^{text} should be a^t/a^text and a_text or a_{text} should be a_text. Languageseeker (talk) 20:56, 25 October 2021 (UTC)

@Languageseeker done (I think) It's all a hack, but it kinda-sorta works ^_^. Inductiveload—talk/contribs 08:51, 26 October 2021 (UTC)

I've been testing it and it works wonderfully. Thank you. I've been using dp_reformat to import some of the more challenging/non-novel texts so that they can be run through the MC to teach users about formatting and to save lots of time proofreading these challenging/long texts. Languageseeker (talk) 16:17, 26 October 2021 (UTC)

Question about correcting errata and printer errors

Latest comment: 2 years ago6 comments2 people in discussion

It seems that the French are doing a better job at handling printer errors and errata than we are by striking a balance between silently correcting as PG and not correcting at all as enWS does. They have templates set up that allow users to enter corrections in PP which will not be transcluded, see [7]. Do you think it might make sense to create a discussion about importing this to enWS?

The implementation appears to be functionally exactly the same as {{SIC}}: <span class="coquille" title="{{{1}}}">{{{2}}}</span>. They just use some CSS to make it green in page-space: body.ns-104 .coquille { .... } Inductiveload—talk/contribs 15:34, 27 October 2021 (UTC)

I think there are differences in that the Page ns will indicate that a printer error has been corrected and that in the tranclusion, there is an option under “Options d’affichage” to show the printer errors/errata corrected. Instead of displaying the correction with a tooltip, they directly tranclude the corrected text. [8]. :: Also the French seem to have a sic template and an errata template to distinguish corrections made by the printer and those made by Wikisource. It seems like it might make sense to distinguish the two. Languageseeker (talk) 15:44, 27 October 2021 (UTC)

Looks like that's JS, something like Mediawiki:Gadget-Visibility. We can put it on the "would like to get working one day" list along with improving the whole visibility system. As usual, I kind of feel that this should be aiming to upstream into the Wikisource extension so all WSes can benefit.

In terms of the template here, {{SIC}} already does what it needs to to enable this (and more, actually). An {{erratum}} template to allow the proofreader to inline the erratum into the relevant location would be a good idea. All it has to do is set the class. Inductiveload—talk/contribs 15:50, 27 October 2021 (UTC)

Turns out there already is a {{errata}}. I agree that upstreaming it eventually would be a good idea. However, the current tooltip system is completely broken on mobile. Do you think it might be a good idea just to import the code for now and then worry about creating a more elegant/universal solution later? Also, I feel that {{sic}} and {{errata}} should be used for displaying and exporting like the French do because it creates a far better experience for users. Languageseeker (talk) 21:06, 27 October 2021 (UTC)

I do agree with {{sic}}, but it's been like that for ages and I don't have the energy for it. I would say fix SIC and then transition {{sic}} once it can be made toggle-able so people can choose.

{{erratum}} seems a good implementation using footnotes already, short of a full-on JS solution. Tooltips are broken indeed. A better thing would be some kind of popup. But....time and effort. I do not have bandwidth for that at the moment, but you can try to play with it as a user script. Inductiveload—talk/contribs 22:12, 27 October 2021 (UTC)

I completely understand. There is always so much to do and only so much that one user can do. I dare say that you do more than your fair share on this site. The good thing about templates is that they don't have to be perfect, only functional. Some day, someone may come along and write some brilliant code that will do everything perfectly. Until then, I'll have users mark printer errors with SIC and errata with erratum. Hopefully, more users will join soon and lighten everyone's burden. Languageseeker (talk) 21:57, 28 October 2021 (UTC)

Problem with MC

Latest comment: 2 years ago5 comments2 people in discussion

Sorry to be spamming you. Feel bad, but ... I'm trying to add two texts for the November MC that only require transclusion (for new users that want to practice transclusion and to help clear the backlog). However, despite being marked as "Not Proofread," they are still being sent into the Completed Texts section. Is there anyway to fix this? Languageseeker (talk) 21:38, 27 October 2021 (UTC)

No worries. I think I have fixed it. Now the explicit status in the Lua table is all that is checked. Inductiveload—talk/contribs 21:57, 27 October 2021 (UTC)

That seems to have worked: partially. The not-transcluded text are indeed in the right spot, but texts that have initial = proofread now show up in the To Proofread section instead of the To Validate section. Probably a case of the programmer's conundrum: squash a bug, make a bug? Languageseeker (talk) 23:53, 27 October 2021 (UTC)

@Languageseeker second time lucky? Inductiveload—talk/contribs 16:27, 28 October 2021 (UTC)

Looks like it. Thank you! Languageseeker (talk) 19:23, 28 October 2021 (UTC)

ragged experiment2

Latest comment: 2 years ago5 comments2 people in discussion

After some attempts I would like to ask you about your opinion on the following experiment. It looks it works as intended. --Jan Kameníček (talk) 00:06, 28 October 2021 (UTC)

Hey, looks good, I didn't think of that! I put it at {{TOC row dotragged}} (with a tweak to avoid double-line-spaces). Inductiveload—talk/contribs 06:33, 28 October 2021 (UTC)

The Dotted cell template enables to add more spaces between dots and also to replace the dots by a different symbol. I have been experimenting with this for the TOC row dotragged too, but have not succeeded. What do you think, would it be possible to add such a feature too? --Jan Kameníček (talk) 17:17, 28 October 2021 (UTC)

Finally I got it :-) Hope I did it right. --Jan Kameníček (talk) 17:58, 28 October 2021 (UTC)

@Jan.Kamenicek looks good to me. Thanks for figuring it out! Inductiveload—talk/contribs 20:11, 28 October 2021 (UTC)

ReOCR Index:This side of paradise (IA sideofparadise00fitzrich).pdf

Latest comment: 2 years ago2 comments2 people in discussion

Would you mind reOCRing this Index that will be run in next's month challenge if you have a moment?

Saw that you noticed that uploaded a new version of this Index. Thanks! Noticed a strange artifact: quotations are rendered as " instead of ", see This Side of Paradise - Fitzgerald - 1920.djvu/161 Languageseeker (talk) 20:56, 29 October 2021 (UTC)

@Languageseeker ha, I was just coming to say I tried the DjVu from the IA but the OCR is still not ideal, so I'm going to regenerate from JP2 and see how that goes. I do not know why the OCR is like that but I think it's probably a historical issue with the IA when that file was generated. Inductiveload—talk/contribs 21:08, 29 October 2021 (UTC)

Bot removing all curly apostrophes and quotation marks from Index:Tarzan and the Golden Lion - McClurg1923.pdf

Latest comment: 2 years ago3 comments2 people in discussion

It seems that users are spending quite a bit of time removing “” ‘’ from this work. Is there a way to use a bot to change “” to " and ‘’ to '?Languageseeker (talk) 12:03, 30 October 2021 (UTC)

@Languageseeker

Done

FYI, the command file for User:Inductiveload/quick pwb is something like

-prefixindex:Tarzan and the Golden Lion - McClurg1923.pdf
-namespace:Page
-summary:Convert curly-quotes to straight quotes for consistency in this work
-regex
[“”]
"
[‘’]
'

Inductiveload—talk/contribs 13:37, 30 October 2021 (UTC)

Thank you! One of these days I'll get into PWB and then I'll probably regret not getting into it early, but, for now, I have a bit too much on my plate already. Much appreciated as always. Languageseeker (talk) 13:58, 30 October 2021 (UTC)

Page Shift for Index:Elizabeth Fry (Pitman 1884).djvu

Latest comment: 2 years ago10 comments4 people in discussion

@Xover: I updated the File on Commons because of several heavily damaged pages in the original version. However, now the proofread pages need to be shifted by +1 starting from Page 2. I'm pinging Xover in case you're not available. Languageseeker (talk) 14:16, 30 October 2021 (UTC)

In addition, DJVU pages 8 and 9 need to replaced with [9] Languageseeker (talk) 15:10, 30 October 2021 (UTC)

A sound source using a bad copy, there are other files at Wellcome and NYPL: Internet Archive identifier: elizabethfry00pitm Internet Archive identifier: elizabethfry00pitm2 Internet Archive identifier: b24867160 Cygnis insignis (talk) 16:00, 30 October 2021 (UTC)

Internet Archive identifier: elizabethfry00pitm and Internet Archive identifier: b24867160 are the American editions; while Internet Archive identifier: elizabethfry00pitm2 is the second edition. Languageseeker (talk) 16:03, 30 October 2021 (UTC)

@Languageseeker ok, so what action is needed here? I can import any of the above, or do the page shift. Inductiveload—talk/contribs 17:15, 30 October 2021 (UTC)

I already imported the file and overwrote the previous one. Can you do the page shift and replace the pages? Languageseeker (talk) 18:04, 30 October 2021 (UTC)

move text from Elizabeth Fry (Pitman 1884).djvu/2 -> Elizabeth Fry (Pitman 1884).djvu/3, etc. Replace Elizabeth Fry (Pitman 1884).djvu/8 and Elizabeth Fry (Pitman 1884).djvu/9 with the images from [10]

Righto:

Done.

FYI, in future if you could also say exactly which pages to move in terms of a page range (or several page ranges) that helps be sure what I am about to do aligns with what you had in mind. For example: you could say "pages 2-216, offset +1". It's OK if the range contains pages that don't exist. Otherwise, I have to figure out for myself that, yes, indeed the page range goes all the way to 216 and the whole range is an offset of +1. Inductiveload—talk/contribs 18:41, 30 October 2021 (UTC)

Thank you for fixing this. I'll make sure to specify the page range in the future. Languageseeker (talk) 20:58, 30 October 2021 (UTC)

This seems wrong: Page:Elizabeth_Fry_(Pitman_1884).djvu/10 Mpaa (talk) 21:33, 30 October 2021 (UTC)

Jedi hand wave. The file is correct, these are the pages you are looking for. Inductiveload—talk/contribs 21:54, 30 October 2021 (UTC)

Error by Inductive Bot - Index:Whole prophecies of Scotland, England, Ireland, France & Denmark.pdf

Latest comment: 2 years ago4 comments3 people in discussion

Could you please tell me why the above Index is in The November Monthly Challenge? It was validated in April 2021. --kathleen wright5 (talk) 08:29, 1 November 2021 (UTC)

@Kathleen.wright5 It was added because it's defined as part of the November challenge (i.e. it is in Module:Monthly_Challenge/data/2021-11) and appears to have been added because it had not been transcluded. This has now been done as part of the Monthly Challenge. Inductiveload—talk/contribs 08:37, 1 November 2021 (UTC)

Also, general questions about why works were selected for the MC would be better at Wikisource talk:Monthly Challenge, as I was not personally involved in the selection of works for this month. Inductiveload—talk/contribs 09:06, 1 November 2021 (UTC)

@Kathleen.wright5 I'm also trying to include some maintenance projects into the MC. Right now, there are over 700 Indexes that have either been fully proofread or validated, but not transcluded. Even if enWS would transclude one-a-day, it would take over two years to clear the backlog. I'm hoping that by featuring some of these in the MC, it will help to clear this backlog. Languageseeker (talk) 15:18, 1 November 2021 (UTC)

Scan Repair for Index:A dictionarie of the French and English tongues - Cotgrave - 1611.djvu

Latest comment: 2 years ago6 comments2 people in discussion

Can you insert [[11]] and [[12]] after Page:A dictionarie of the French and English tongues - Cotgrave - 1611.djvu/27. And [13] and [14] after Page:A dictionarie of the French and English tongues - Cotgrave - 1611.djvu/235 Don't worry about moving the text, I can re match-and-split. Languageseeker (talk) 14:23, 2 November 2021 (UTC)

@Languageseeker Is that definitely all of them? Inductiveload—talk/contribs 14:57, 2 November 2021 (UTC)

No, I noticed that there are pages missing after 625, 769. Will keep on checking Languageseeker (talk) 15:22, 2 November 2021 (UTC)

Ok, those are the only two other gaps. After 625, [15] and [16]. After 769, [17] and [18]. Languageseeker (talk) 15:51, 2 November 2021 (UTC)

@Languageseeker

Done Inductiveload—talk/contribs 17:00, 2 November 2021 (UTC)

Thank you. I'm running the match-and-split. It's a 6.5mb text file, so it'll take some time. Languageseeker (talk) 20:48, 2 November 2021 (UTC)

Bot Job for Index:A dictionarie of the French and English tongues - Cotgrave - 1611.djvu

Latest comment: 2 years ago3 comments2 people in discussion

Could you replace all instances of " " with a new line and a 1em indent. Also, could you remove all poem tags? Languageseeker (talk) 02:22, 4 November 2021 (UTC)

@Languageseeker I can do it, but it would have been ~900 times easier if done that before splitting. As usual, my recommendation is to slooooow dowwwwwn and think things over before rushing into the first action you think of. Inductiveload—talk/contribs 07:29, 4 November 2021 (UTC)

Actually I do not think this is correct. There are lots of multiple-spaces and not all of them are new lines:

Abandonnemént. ''at randome, dissolutely, licenciously,   profusely,with libertie.''

Abandonner: ''to abandon, quit, forsake, forgoe, waiue   or give ouer, shake or cast off, lay open, leaue at randome,   prostitute vnto, make common for, others; also,   to outlaw.''   Abadonner la vie de tel au premier qui le   pourra tuer. ''to proscribe a man; (is ever to be vnderstood   of a Soveraigne, or such a one as, next vnder   God, hath absolute and vncontrowlable power ouer   his life.''   s'Abandonner à plaisirs. ''sensually to yeeld, or become   a slave, vnto pleasure; wholy to captiuat, or deuote,   his thoughts to delights.''   Fille qui donne s'abandonne: Pro. ''A maid that   giveth yeeldeth.''   Il commence bien à mourir qui abandonne son   desir; Pro. ''he truly begins to die that quits his chiefe   desires.''

Ideally we want to find a transform that will allow us to leverage the Mediawiki definition list markup like this:

; Abandonnemént.
: ''at randome, dissolutely, licenciously, profusely,with libertie.''

; Abandonner:
: ''to abandon, quit, forsake, forgoe, waiue   or give ouer, shake or cast off, lay open, leaue at randome, prostitute vnto, make common for, others; also,   to outlaw.''
:; Abadonner la vie de tel au premier qui le pourra tuer.
:: ''to proscribe a man; (is ever to be vnderstood of a Soveraigne, or such a one as, next vnder God, hath absolute and vncontrowlable power ouer   his life.''
:; s'Abandonner à plaisirs.
::''sensually to yeeld, or become   a slave, vnto pleasure; wholy to captiuat, or deuote, his thoughts to delights.''
:; Fille qui donne s'abandonne: Pro.
:: ''A maid that   giveth yeeldeth.''
:; Il commence bien à mourir qui abandonne son desir; Pro.
:: ''he truly begins to die that quits his chiefe desires.''

Inductiveload—talk/contribs 12:15, 4 November 2021 (UTC)

Defaults...

Latest comment: 2 years ago2 comments2 people in discussion

Module:New texts currently defaults to Template:New texts/data.json with predictable results:

{{#invoke:New texts|new_texts|offset=9|limit=12}}

Wouldn't it make more sense to default to the current year when no |year= is given? (cf. Template:New texts/sandbox) Xover (talk) 08:44, 8 November 2021 (UTC)

Sure, it was just a holdover from when the current data was just at "data.json". diff. Inductiveload—talk/contribs 08:49, 8 November 2021 (UTC)

That iffy feeling…

Latest comment: 2 years ago2 comments2 people in discussion

The one that says, maybe I don't want to open the lid on that mystery container at the back of the fridge because who the heck knows what will come crawling out. I've been having that feeling for a while regarding the magical mystery black box that is phetools. But since the PWB thing forced my hand I've had to start opening lids. Let me illustrate by the pseudocode version of the algorithm that makes the Phe OCR gadget so fast:

titles = SELECT page_title FROM <Index: namespace on enws>;
for title in titles
    if not exists ocr_cache[title] then
        generate_ocr(title)

Because the flip side of the fridge horror above is the feeling you get after fixing a bot that's been dead for a while and discover it's decided to download every single PDF and DjVu file on commons to warm its OCR cache. Having to do emergency database surgery to excise the ~70k jobs already queued up in its internal grid engine manager before the Toolforge admins come `round to have a wee bit of a chat is… Well, I don't recommend it as a habit.

This thing is so clearly an eldritch horror poking its icy cold tentacles through a weak spot in the skein between dimensions. Maybe not Cthulhu itself, but surely Th'rygh, The God-Beast or Sho-Gath, The God in the Box. Xover (talk) 18:32, 8 November 2021 (UTC)

Ha, well, technically using a nuclear weapon is still "warming", even if people standing nearby get a bit grumpy. Inductiveload—talk/contribs 18:45, 8 November 2021 (UTC)

Hws & hwe

Latest comment: 2 years ago3 comments3 people in discussion

I just saw a note you left with another user. Just what? How are we to know when changes like this happen? I used to get changes on the Scriptorium page on my Watchlist which I check but it doesn’t seem to be showing up anymore. It seems a major change to me, as a proofreader and quite distressing to be oblivious of it happening. I am working on a Beginners’ proofreading guide. Can you tell me of any other changes that I may be unaware of? I’ve noticed you seem to have your finger on the pulse. I’d appreciate the support. Cheers, Zoeannl (talk) 23:30, 8 November 2021 (UTC)

@Zoeannl: I'm using a phone, so I can't get too wordy, but the hyphenation thing was introduced in September 2018: Wikisource:Scriptorium/Archives/2018-09#Words_hyphenated_across_pages_in_Wikisource_are_now_joined.

Probably the other biggest recent change is H:Page styles, which allows index-specific CSS. I can go into more detail tomorrow if you like. Inductiveload—talk/contribs 23:59, 8 November 2021 (UTC)

Zoeannl, it doesn't mean that you need to stop using HWE and the community has not deprecated its use and it is still supported, just there is now an alternative. There are still situations where HWE has to be used. To note that we did have a conversation more recently that we do need to get better with our announcements with regard to changes taking place. — billinghurst sDrewth 23:00, 9 November 2021 (UTC)

Pop goes the… Extension?

Latest comment: 2 years ago4 comments2 people in discussion

In case you're not aware: mw:Extension:Popups. Xover (talk) 08:59, 10 November 2021 (UTC)

@Xover I have seen this, but I counter with: Popups Reloaded is wayyyyy betterer, and does lots of cool stuff ner-ner ner-ner. Inductiveload—talk/contribs 09:04, 10 November 2021 (UTC)

Yeah, I haven't looked at it; I just ran across the link and figured it might be relevant due to Reloaded (which I haven't looked at either). Incidentally, I'm cross-loading the enwp upstream of Popups instead of our locally-ported copy, and it is much nicer. "Good enough" rather than "Great", but everything is relative. Xover (talk) 09:58, 10 November 2021 (UTC)

@Xover reloaded is far from done, but even now 1) it's got lots of fun WS'y features (page image on hover anyone?) and 2) it's designed to allow pluggable extra modules (though the API for that isn't baked yet, so caveat implementor). Inductiveload—talk/contribs 10:03, 10 November 2021 (UTC)

Adding border option for {{FI}}

Latest comment: 2 years ago6 comments3 people in discussion

Would it be possible to add a paramater to set the border in {{FI}}? Some books have black borders around images that this template cannot handle requiring custom CSS code, see Index:Negro poets and their poems (IA negropoetstheirp00kerl).pdf. Languageseeker (talk) 16:34, 8 November 2021 (UTC)

@Languageseeker Hmm, that would need an extra <span>, since the [[File:...]] markup doesn't accept a style parameter. Looks like the imgstyle parameter was an attempt to do that and I failed at it.

As you seem to have found, index-based CSS will do this happily, and there is both a cclass and imgclass to assist in targeting the CSS if needed. Inductiveload—talk/contribs 17:16, 8 November 2021 (UTC)

I see, thank you. Languageseeker (talk) 17:33, 8 November 2021 (UTC)

@Languageseeker: Unless there was a change in templates, we are talking about two templates. {{FI}} is a <div></div> based template and {{FIS}} is <span></span> template. The difference was to allow text to flow around the frame unbroken. Which one are you referring to? — Ineuw (talk) 07:45, 13 November 2021 (UTC)

I created a sample using both templates User:Ineuw/Sandbox4.— Ineuw (talk) 08:04, 13 November 2021 (UTC)

The templates surround both the image and the caption with a border. The desire is to have an option that would just surround the image. Languageseeker (talk) 03:32, 15 November 2021 (UTC)

Option to Export Text Layer of an Index

Latest comment: 2 years ago23 comments3 people in discussion

I was wondering if there's an option to export the entire text layer of a Index similar to how PGDP can export the concatenated text file. The output would be something like this

====Page:1====
Status: N (Not Proofread) B (No Text) P (Proofread) V (Validated) <header>
header text

<body text>
body text

<footer>
footer text

====Page:2====

This would be a great way to be able to search for common problems in the Index and also to be able to have a copy of the raw wikicode for an Index. Languageseeker (talk) 17:33, 8 November 2021 (UTC)

@Languageseeker hmm, interesting. It's probably pretty easy to do with Python.

https://book2scroll.toolforge.org/ is similar, but without the "save as file" and wikitext options. Inductiveload—talk/contribs 17:36, 8 November 2021 (UTC)

Is there anyway to avoid python? That seems to quite a layer of complexity for most users. Languageseeker (talk) 17:51, 8 November 2021 (UTC)

Well, once the Python exists, it can be deployed on Toolforge. 100x easier than getting it into the extension (for one, it would need a formal format to be defined). Inductiveload—talk/contribs 17:58, 8 November 2021 (UTC)

I was thinking that maybe Proofreader Page might actually be the best place to add this code. Ideally, there should be an option to export/import a project. Right now, there's no easy way to export the data from a project or recreate it in another space. Ideally, this would generate a zip file on zip file that would contain a text that starts with information about the file and metadata followed by the text from all the Pages as well, the original scan, and any media files. Languageseeker (talk) 21:50, 8 November 2021 (UTC)

That could still be done on Toolforge. Building into the extension is a good idea, but 1) you need a very well-defined format to fix and 2) it takes an enormous amount of effort and and even larger amount of time to get non-trivial patches accepted in the extensions. Easily an order of magnitude lower velocity than a Toolforge tool. Inductiveload—talk/contribs 22:34, 8 November 2021 (UTC)

I see. It's a practicality issue. Could you add it to your already too long list of things to do? This will be important to users who wish to proofread and also those who wish to have a complete archive of an Index to either use on a different instance of Wiki or in whichever this capacity they wish. It's also an key to fulfilling the open-access philosophy/promise of Wikisource. Languageseeker (talk) 00:42, 9 November 2021 (UTC)

Ideally, there would be two level of backup

A pure textual one consisting of a concatenated text file.
A full backup that could be imported into a clean wiki-setup. This would include
1. The Scan (PDF, DJVU, Images)
2. The Metadata for the Scan
3. The CSS File
4. The concatenated text files
5. Any media files and their metadata
6. Any Templates and their documentation used
7. Any pages where the scan is transcluded.
8. Any pages that link to the transculsion. This will probably be Author pages or Version. Languageseeker (talk) 02:34, 9 November 2021 (UTC)

Forgot to ask the all important question. Do you think that this is something that your can do or do you not have the bandwith/time for it? Languageseeker (talk) 15:24, 9 November 2021 (UTC)

@Languageseeker Honestly, I do not think there's a lot of benefit to building this into the PHP. Any format would be extremely specific and not generally useful. All the information you need is explicitly available on the API. I think you should really be thinking about what you want to achieve here. You use the word backup, so this makes me think that you're thinking of some kind of archival purpose rather than any proofreading-related purpose. Database dumps of the whole of Wikisource are made every month or so, so if you're after an archive, maybe you can check them out.

In short, I do not really have time or inclination for any involved tool without a pretty solid "business case". On the other hand, a quick hack-up of "get the wikitext of every page in an index" is not very hard. Inductiveload—talk/contribs 15:32, 9 November 2021 (UTC)

For me, there is both a use for proofreading and for backup.

I think the business case for backup would be make it possible for users/institutions to get their work out of this project. I can imagine many cases where institutions or users may want/be willing to use the Wikisource platform to proofread as long as their are able to get the work out in an easy way. This weekend, on LinguaLibre, there was a similar case where a user was willing to contribute because they were expecting that they could download their pronunciations easily. As it turned out, there is no such way which caused some embarrassment and lead to the team downloading every pronunciation manually to avoid losing the user. I think that WS faces the same. Say the NLS would like to download their chapbooks. How would this be possible? Think about how many Indexs on enWS have images or repaired files. I've imported quite a few works from PGDP and one of the constant challenges that I face is that the text file does not correspond to the actual file on IA or HT. Keeping the text file with the image files/scan will make it possible to actual backup the work.

For proofreading, a system to import/export an Index will have several benefits. First, it will enable users with slower internet connection to contribute without having to worry about long load times or losing data. Second, it will also enable users to proofread an entire text or search for common errors. Finally, it will also make it easier to locate a specific error.

I think that "a quick hack-up of "get the wikitext of every page in an index"" is a great start and would be a wonderful thing to have. Would at least that be possible? Languageseeker (talk) 15:48, 9 November 2021 (UTC)

@Languageseeker Right, but what's a pile of wikitext, Lua and images going to achieve? You'd have to import it into a near-perfect clone of Wikisource as it was at the time of export. For what purpose? In case Wikisource gets nuked? WS Export already provides HTML export, as well as PDF, ePub, MOBI, text and RTF. Wikitext export is already completely possible via the API or DB dumps and can easily be done, but the format it ends up in will be wikitext and essentially completely useless except for feeding back into a wiki and more suitable for some kind of offline match-and-split-like workflow that feeds back to Wikisource itself.

Anything more than a straight wikitext dump of the pages in an index is weeks of work and an ongoing maintenance burden, so you really need to explain what it's for, other that "man, wouldn't it be fun if".

And if you do want to feed back into a wiki locally, then we already have Special:Export (probably with Special:PrefixIndex assuming people have done the Right Thing and used subpages properly), as well as the aforementioned DB dumps, API access and the Wikisource-dedicated export tooling. Inductiveload—talk/contribs 16:03, 9 November 2021 (UTC)

I don't think that it's going to be anywhere near a perfect or easy process to import the files into another system. However, it would be possible. Creating an export for an index will enable users to do what they wish with the data in an easy in convenient manner. That is why the three most important aspects to export are the text layer, scan, and images. The other features are nice to have (especially transclusion ranges), but are not strictly necessary. For me, this is a central pillar of a commitment to maintaining open-access to the information produced. Anyone should be free to take the raw data produced on enWS and do with it as they please. Languageseeker (talk) 16:29, 9 November 2021 (UTC)

But it's possible now. Getting the relevant data from the API and/or a DB dump is no harder than getting the data out of some special-sauce WS-specific package format. In fact, it's probably easier, because there's probably already tooling for handling DB dumps in whatever language the user wants (certainly Python and PHP).

There's still no concrete use case beyond "sounds fun". You need to find a client for this feature and make sure what you're proposing actually works for them. Bulk archiving is already provided by the software. Yes, Special:Export is missing the images, but that's a defect in the core (phab:T15827, 13 years old), and should "just" be fixed (ahahahaha, I crack myself up) there rather than getting me to do more of the WMF's homework and piling on more external tools to paper over lack of upstream interest. Tl;dr go and complain at them.

A way to generate a Special:Export package for all the pages in an index without having to use Special:PrefixIndex may also make sense (i.e. leverage the tools we already have)

A dump of wikitext in one big file I can understand, because then you can use a text editor to do various fixes without needing to bot them in "live" (though you'll still need a bot to upload at the end). Inductiveload—talk/contribs 16:46, 9 November 2021 (UTC)

Alright, then can just the feature to export all the Pages to a txt file in an Index be added? Languageseeker (talk) 17:00, 9 November 2021 (UTC)

For such an edge case, what is wrong with Special:Export and let the users work out how they manage it. I sometimes wonder why we are trying to replicate rarely utilised functionality when we have needed improvements. If it is something important, stick it into phabricator: with all the other TO DOs. — billinghurst sDrewth 22:34, 9 November 2021 (UTC)

I don't think it's edge cases. I've been thinking about this more and here are what I think are some real scenarios in which this can help.

Checking the formatting of an entire Index. For example, say that an Index has plates that should all be 500px. Right now, if you want to verify that all the images are in fact 500px, you would need to open every page, then click on edit, and then check. (Also, hope that the pages with images are actually marked.) This can be quite time consuming. If you had all the pages as a single text file, then you can just use find to check them all. This case can be generalized out.
Finding an error in a book. When I read a WS text on my Kobo, sometimes I notice obvious scannos like "1t." On the Kobo, I can highlight the text which saves it to an annotation file. However, Kobo will save this as "Chapter 10: LETTER VII." So, if I want to find this error, I need to go to the transcluded work, find the right chapter, search in the chapter for the text, and then click on the page. It's a huge time waste. It gets worse when there are multiple Letter VII.
The ability to import the text would also make it easier to correct common errors such as "— " or curly quotes.
The ability to export/import images would make it much easier to replace poor quality images. Recently, I worked on replacing all 174 images in Index:The Adventures of Huckleberry Finn (1884).pdf because the existing images were cropped from the DJVU. The ability to export them with an accompanying XLS file would save a ton of time when it would come to reuploading them. As long as there is a reason column, there should be no technical barrier to using a script to override all 174. That is far faster than manually reuploading 174 files.
It could also become possible to generate metadata for missing images. It would generate the metadata for all the missing images in an XLS file. Once the images are added to the folder, the script could upload them without the user having to manually create the metadata. This would greatly speed up the adding of images.
In the long run, a proper system for importing/exporting text would enable the creation of an offline proofreading interface similar to AWB. There are many cases in which users might have a slow connection or just loading images from PDF/DJVU is simply a slow process. Creating a way to download/upload Indexes and individual pages would greatly speed up the work. Languageseeker (talk) 01:54, 10 November 2021 (UTC)

@Languageseeker: Re I need to go to the transcluded work, find the right chapter, search in the chapter for the text, and then click on the page. It's a huge time waste. It gets worse when there are multiple Letter VII. As a semi-tangent, you may appreciate the replace tool in User:Inductiveload/maintain, which allows you to highlight the text 1t and replace it directly in the Page namespace, if possible (usually it is).

correct common errors such as "— " or curly quotes. functionally, AWB, JWB or PWB (perhaps with User:Inductiveload/quick pwb) are existing tools that can do this already.

You can already get access to images in a given index, e.g. https://en.wikisource.org/w/api.php?action=query&format=json&formatversion=2&prop=images&generator=prefixsearch&gpssearch=The%20Strand%20Magazine%20(Volume%201).djvu&gpsnamespace=104&gpslimit=2000

Ditto for the content: https://en.wikisource.org/w/api.php?action=query&format=json&prop=revisions&list=&generator=prefixsearch&formatversion=2&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser%7Ccontent&rvslots=main&gpssearch=The%20Strand%20Magazine%20(Volume%201).djvu&gpsnamespace=104&gpslimit=2000 In fact, a thin wrapper over this interface is all the putative Toolforge "exporter" would be anyway. If you're already using scripts, you should just hit that API yourself and then you get more control anyway.

Most of what you're asking is already possible, and if you're already using a custom script, the normal API is much more reliable, available, tested and stable than any Toolforge tool would ever be. It still sounds to me like you are coming up with solutions to problems before you've actually worked out a workflow that has the problems. Inductiveload—talk/contribs 21:58, 10 November 2021 (UTC)

Wow, I did not realize how amazing the API was. However, when I try to get the raw content for Mansfield Park or frWS, it seems that it does not show the content for all the pages and the pages are out of order. Is there any way to show the content for all the pages in order? Languageseeker (talk) 01:48, 11 November 2021 (UTC)

At some point you're going to need to process the data anyway. Sorting that array is a one-liner in Python: pages.sort(key=lambda page: int(page['title'].split('/')[-1])).

I see 308 pages there, which looks right by Index:Austen - Mansfield Park, vol. II, 1814.djvu? Remember the JSON array is 0-indexed, but the page numbers are 1-indexed.

When the generator for index pages is ready (currently work-in-progress), that will be the better option. Inductiveload—talk/contribs 07:46, 11 November 2021 (UTC)

While it correctly identifies all the pages, at some point, it stops outputting the revisions field which contains the content, see User:Languageseeker/sandbox3. Also, which generator are you discussing? Sorry, if I've missed something obvious. Languageseeker (talk) 12:31, 11 November 2021 (UTC)

@Languageseeker the generator is the one that will be implemented in phab:T291490. I'm halfway though doing it. Deployment will be when it will be. I have to finish it, and then shepherd it through code review.

For the data there, that's because you are not logged in, so you have lower API limits (50 vs 500). You will either need to make the API query from some logged-in session, or handle the continue.rvcontinue field correctly. Note, that since some books are over 500 pages long, you need to handle the continue data anyway. 13:06, 11 November 2021 (UTC) Inductiveload—talk/contribs 13:06, 11 November 2021 (UTC)

Thank you for your wonderful and detailed explanation. I'm looking forwards to seeing the generator when it is done. It sounds very cool. Languageseeker (talk) 01:04, 12 November 2021 (UTC)

Transcluding

Latest comment: 2 years ago5 comments2 people in discussion

I am trying to transition from using {{page}} to using <pages/>. With your help, this has gone well, but I face a new problem where the book I am working on is missing two pages (discovered late in the game unfortunately). I handled this in my usual fashion which is to import (via JPEGs) the two pages needed from another copy of the book. The pages in question are pp. 412-413 (see Index:The Reminiscences of Carl Schurz (Volume Two).djvu). My problem is I don't know how to transclude the patch pages except by using {{page}}. I have done this for The Reminiscences of Carl Schurz (book)/Volume Two/Chapter 8, but the transition between pp. 411 and 412 is poor (the one between pp. 413 and 414 worked fine since p. 413 ends with a complete paragraph). How do I make this work smoothly without using {{page}}? Thanks for any suggestions. Bob Burkhardt (talk) 17:08, 15 November 2021 (UTC)

@Bob Burkhardt the best thing to do here is to repair the scan by inserting the missing pages, then you can keep it normal. I did this using those two files and then moved the pages into their new homes and adjusted the transclusion (obviously this is easier when it's the last chapter of a book!). Wikisource:Scan Lab exists for this kind of repair - if you notice that a book has a defect, you can get it fixed there and hopefully it'll be done before you get to the pages in question. Inductiveload—talk/contribs 17:44, 15 November 2021 (UTC)

Thank you for bringing the new resource to my attention. There is still a problem: Page:The Reminiscences of Carl Schurz (Volume Two).djvu/484 has an image for p. 413, rather than p. 412 as it is supposed to. Can you fix this for me or have it fixed? Bob Burkhardt (talk) 18:12, 15 November 2021 (UTC)

@Bob Burkhardt Sorry, that was my fault, I uploaded the wrong "fixed" file. Should be OK now. Thanks for checking. Inductiveload—talk/contribs 18:22, 15 November 2021 (UTC)

Looks good. Thank you for grappling with this. Bob Burkhardt (talk) 18:29, 15 November 2021 (UTC)

Stash Bug and Batch Upload

Latest comment: 2 years ago1 comment1 person in discussion

With the stash bug fixed, is it possible to do batch upload of periodical again? I think that The Dial, Volume 75 is a good example of how having scans enables users to scan-back works published in periodicals. Languageseeker (talk) 14:54, 16 November 2021 (UTC)

Unpurgeable stale thumbnails

Latest comment: 2 years ago2 comments2 people in discussion

cf. T215558. You don't happen to have any current examples of files with stale thumbnails that can't be purged? Xover (talk) 19:03, 17 November 2021 (UTC)

@Xover can't think of any, sorry! Inductiveload—talk/contribs 19:27, 17 November 2021 (UTC)

Hanging indent

Latest comment: 2 years ago3 comments2 people in discussion

Hi. Do you have any suggestion on how to manage the indentation e.g. in the first 2 lines of Page:Dictionary_of_National_Biography._Errata_(1904).djvu/296? The first line could be managed with {{hi}} but what about the second? If there are no existing templates that can be simply combined without being too hacky, do you have any suggestions for a custom template? Also considering that it would be used all over the place. Thanks Mpaa (talk) 22:11, 19 November 2021 (UTC)

@Mpaa hnaging indents are fundamentally a hack of a negative text-indent and a padding or margin to give the first line a space to "hang" into. Usually the two are the same (but one is negative). So what it looks like you need there is a padding greater in magnitude than the negative text-indent:

<div style="border: 1px solid green; padding-left:4em; text-indent:-2em;">
{{lorem ipsum}}
</div>

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Inductiveload—talk/contribs 18:39, 21 November 2021 (UTC)

Thanks. A change in {{tl|hi}} to set "text-indent" independently would do but I can't see a way of keeping a nice interface and compatibility. Maybe a dedicated template would be better then. Mpaa (talk) 20:32, 21 November 2021 (UTC)

Switching from px to em

Latest comment: 2 years ago2 comments2 people in discussion

I didn't dismiss your advice about using "em" instead of "px". At first when I implemented it, the images were smaller than what I was used to. Then you reminded me about adding "!important" which made a difference, but still the image is 8 pixels smaller when converting pixels to em and measured with a pixel ruler. Could you please look at this page with the two images and tell me what I am doing wrong? Same images two sizes.— Ineuw (talk) 04:30, 21 November 2021 (UTC)

It looks like the font size is actually set to 14px because there's a global CSS rule for . .vector-body { font-size: calc(1em * 0.875); }. 14px * 32 = 448px, which is indeed the width of the box.

Exact sizing on the order of 10% is not incredibly important anyway, because it strongly depends on the user-agent. You might see it as 14px = 1em in this exact case, but that presupposes a "base" ratio of 1rem = 16px, which is only a common default on desktop browsers and may be wildly different elsewhere (even on desktop, if someone has changed their font size). Inductiveload—talk/contribs 18:36, 21 November 2021 (UTC)

ppoem and left overfloat

Latest comment: 2 years ago4 comments2 people in discussion

Module:Ppoem uses %S to match the text before a <<<, meaning you can't overfloat a string with spaces in it. Is that deliberate? Xover (talk) 18:25, 21 November 2021 (UTC)

@Xoverno, I don't think it was. Just didn't think of that. Inductiveload—talk/contribs 18:27, 21 November 2021 (UTC)

Looks like you quickly run into trouble with the 2em left/right gutters when you stuff arbitrary strings (i.e. wider than 2em) into these. Not sure whether that's "Don't do that then.", adding template params to control the gutter widths, or pointing the issue to index CSS. Possibly we could guesstimate the needed width based on the string length and handle it automagically, but that sounds… hacky. The one I ran into is a once-off that can squeak by as is, and I haven't seen that in any of the other works I tested ppoem with, so I'm going to leave it to simmer for a bit. Xover (talk) 20:36, 21 November 2021 (UTC)

Note before adding template params: you can also control gutters with CSS using ws-poem-left-gutter and ws-poem-right-gutter. Maybe params are better, but like you say, lets see what boils over! Inductiveload—talk/contribs 22:24, 21 November 2021 (UTC)

IME and ULS and Beta Code (and interested in testing?)

Latest comment: 2 years ago3 comments2 people in discussion

Awhile ago (5 Sept) I mentioned I was writing a user script to do easy keyboard entry of Ancient Greek polytonic script. Using w:Beta Code as was the basis for your template experiment {{betacode}}.

And it was <!*wonderful*!> with multiple options for visual feedback and all the easiness of Beta Code. I used it a lot at EL on a bit of ambition.

But it felt a bit hacky and was a lot of work, working around Wikimedia and doing all my own UI visual displays. And I remembered your comments about ULS and jquery.ime.

After several days of discovery and coding I've convinced jquery.ime to do Beta Code, with both strict rules and deliciously loose rules. It isn't as beautiful, but still wonderful to use, and the Wikimedia people will not actually hate it.

But it *is* rather hard to demonstrate 'live'. I've worked out a way to bootstrap the development copy of the new rules into online wikisource, but it requires a localhost HTTP server and many (>10) steps to force it into the live wikisource IME for use and testing.

Thing is, a Beta Code implementation using jquery.ime's very basic tools was complicated. I had to write programs to generate all the rules. Even the strict rules set is 200 rules of magic (~275 total). The loose rules - very kind to users - is 1200 rules of magic (~1275 total). The previous largest jquery.ime rules set was 179 rules. The wikimedia people might have heart attacks?

Soon I'll be ready to submit a pull request at Github, but I understand they are kind of slow(?) merging pulls into that project, and then merging jquery.ime updates into Wikimedia. So figuring out how to excite people is a goal?

So I did it using ULS and jquery.ime. Any advice on getting this into wikisource this year? :-) Shenme (talk) 06:38, 15 November 2021 (UTC)

@Shenme this looks amazing. I don't have much advice for getting things into the code base other than "be very, very patient" and then "be more patient" and then "don't kick up a fuss and just take the beating when you have followed existing code and documentation but still get told it's wrong and go round the the review process dozens of times" because the process can take months and months and can also be incredibly frustrating to the point that I sometimes seriously consider kicking a dustbin across the room and giving up on anything that needs a merge. "Fortunately" the deployment pattern for userscripts and gadgets is so broken that I keep going back to merges as a way to get any code deployed anywhere near sanely. You have probably also noticed this by trying to deploy something locally.

I will try dig into it, but my initial feeling is that using combining diacritics will allow a substantial saving in rule lists.

Even if we cannot get it into the upstream, we can first deploy here as a gadget, and if the ULS people still can't be persuaded of its utility, we might also be able to squeeze it into the Wikisource extension. Inductiveload—talk/contribs 07:50, 15 November 2021 (UTC)

Still in progress, but first you fix other people's problems? Oh well, useful social credit. Shenme (talk) 04:52, 23 November 2021 (UTC)

Error Message on MC

Latest comment: 2 years ago7 comments2 people in discussion

I've frequently been getting this message "The time allocated for running scripts has expired." on the main MC challenge page and none of the books are showing. Could you please take a look? Languageseeker (talk) 23:39, 18 November 2021 (UTC)

@Languageseeker hmm, I guess the accretion of new indexes into such a large MC is pushing the limit for the Lua stuff. Inductiveload—talk/contribs 14:08, 19 November 2021 (UTC)

@Languageseeker FYI this is (hopefully) fixed by phab:T296092 and a backport deployment of the fix will be done tomorrow. Inductiveload—talk/contribs 18:41, 21 November 2021 (UTC)

Wow, that's so fast. Languageseeker (talk) 22:20, 21 November 2021 (UTC)

@Languageseeker well...let's see if works first! ^_^ Inductiveload—talk/contribs 22:34, 21 November 2021 (UTC)

@Languageseeker it is now deployed. The MC pages are still not blazingly fast, but they seem substantially better, and I think we shouldn't have issues with challenges around the 50 index mark any more. Inductiveload—talk/contribs 12:54, 22 November 2021 (UTC)

Definitely feels faster! Languageseeker (talk) 21:19, 22 November 2021 (UTC)

some files

Latest comment: 2 years ago14 comments2 people in discussion

I found a book that did not have a printed copyright, that is probably from 1931 but maybe from 1915 and I scanned it. I scanned it to jp2 and duped that into png and uploaded them here.

If necessary, I will enter it into a process to get approval or no, although, that didn't go so well here, so maybe at the Commons but some steering me into the right direction would be appreciated.

As I see it, worse case, the scans are not approved and go with the files to be released in 2029. That is not so bad (its not great though). So, the files are at the commons. Included with them is an advertisement for this book from another book which is here. There is an ad for the other book also in that cat, I really thought I saw the image that is on that ad in this book but have failed to find it.

A couple of other things. If you prefer jp2, I can manage that (I found a place to upload). Also, If you would like a really good for tesseract set of dups, I can make those (I just need to dust my script off) -- I don't have tess on this computer, so I cannot provide the text files.

Also, thanks for cleaning up that toc! I should have looked at that also because all of the other multi-page things needed tweaking also.--RaboKarbakian (talk) 03:44, 18 November 2021 (UTC)

((also, I scanned the blank pages because I scanned all of the odd pages first and the even pages second and I can juggle, but maybe not so well. The next book is a 2 volume set from 1896!! They are some of the most beautiful books I have ever had my hands on!!))

OK, well I can make a DjVu out of that easily enough (is that the question?) Some hints for the next scan:

For the images, do try to get into the spine a bit more, because the scanner has a very low depth-of-field and loses focus in the gutter: phab:F34753564. This is hard on a flatbed, but if you're hoping to scan a lot, you may consider a book scanner with an "edge bed" like an OpticBook (I don't know if that's actually any good in terms of image, I just know the 3600 model is cheap on eBay, I don't have one myself). Alternatively, the time-honoured DIY method is a (good) camera on a tripod and a sheet of glass to flatten the page. This is slow and fiddly, but gets excellent results, and if your camera, lens and lighting is good it is probably better in terms of colour reproduction than whatever manky electronics they shove into a consumer-grade scanner. I wouldn't want to scan a whole book like that, but maybe it's practical for only the images. The next step up is a v-cradle scanner like the IA themselves use, but that's some serious DIY unless you're really scanning a lot of books. The actual optical setup is still a real camera + glass sheet, it's just a question of throughput at that stage.
Do try to rotate the files before uploading. If you're already batch-converting JP2→PNG with ImageMagick, it's in the same command: mogrify -format png -rotate 90 *.jp2
You do not need special versions for Tesseract - it has a built-in binarisation step that will handle these images perfectly well. It's when the image has poor contrast between text and background (e.g. dark paper, light print, bleed-though, bad scanning or something like that) that you might consider a pre-OCR processing step.

That said, except for the gutter, the scans are very very good indeed, right down to the printing "dots". For "copyright clearance", the best forum is probably WS:CV. Inductiveload—talk/contribs 07:52, 18 November 2021 (UTC)

I have a great camera, (I worry sometimes that it is worth more than me!) and I saw the howto at IA but maybe I should get my brother onto the hardware. My inter-library loans will end soon though.

Just for your growing ability to look at images and figure out what happened between inception and delivery: I blocked off an area of the scanner bed (using the dividers from a box of tea bags, actually) because the edges of the glass don't scan, even if their rules make it look like they do. I had the scanner software rotate the even numbered scans as I used the same area for both sides of the open book. My conversion script is just a format conversion, nothing more. So, half of them were scanner rotated only. The covers were apparently on the scanner bed in the right direction. To clarify: the only rotation done by me was via the scanner, rotating the even numbered scans.

My script for preparing the scans for tess was the gutenberg recipe. I was reminded of it by the scans at Hathi, which have been very clearly posterized in a truly harmful way to what had been beautiful little line drawings (some sniffles, a couple of tears shed, the supression of a hatred for how this world works, etc.)

There is an interesting thing about this book. I have two copies, one is a nineth edition the other, less old and less with blown reds...., from Lippencott. Both from libraries. The unnumbered pages -- the pages in each are in a different order. All the worse because it is a poem I kind of know. So, </whine> and thanks for all the information and analysis!--RaboKarbakian (talk) 14:25, 18 November 2021 (UTC)

Yes, I can see you haven't rotated them, because they're (nearly) all sideways. My point is you can fix that in a few seconds with ImageMagick: mogrify -rotate 90 *.png, or even do it in the same command as the format-shift to PNG.

Generally, I'd say don't use any processing software that comes with a scanner, because it's pretty universally junk. Scanners are for getting the images onto a computer where you can handle them with real tools.

You do not need to process these particular images at all to feed them into Tesseract - that's only needed if the default binarisation fails. tesseract image.png - -l eng works just fine. I'd say you could try it for yourself with the OCR tool like this, but since the image is sideways, it won't work. Inductiveload—talk/contribs 14:40, 18 November 2021 (UTC)

Also, should 15,16 and 33 be in that category, or are they just gaps in the scan image numbering? Inductiveload—talk/contribs 18:04, 18 November 2021 (UTC)

I mentioned that the two books did not match and that there are no page numbers. I compared the books and the book I scanned was clearly out of order. So, I reuploaded into the namespace and fixed it -- however, for whatever reason, the correctly ordered book was two pages less than the book I scanned. The whole process was very disturbing. The files can be renumbered if necessary (I started again from the end when I got a little mess up....) I think that the one black and white plate is actually an end page, and if I were to organize them, I would have put the last color image before the last text page, but this is the one book matching the other.--RaboKarbakian (talk) 20:35, 18 November 2021 (UTC)

There was never a 33. That is an actual mistake I made.--RaboKarbakian (talk) 20:39, 18 November 2021 (UTC)

The question is: are the files on Commons a complete set and in the correct order if sorted numerically? Inductiveload—talk/contribs 22:30, 18 November 2021 (UTC)

To the best of my knowledge, Yes. Also, thank you so much for your time and patience and your sharing of knowledge thoughout this endeavor of mine.--RaboKarbakian (talk) 00:33, 19 November 2021 (UTC)

Here ya go! Index:The Night Before Christmas - 1915 - Moore.djvu. Inductiveload—talk/contribs 10:40, 22 November 2021 (UTC)

I completely missed this!! I had to get slapped around at the commons before I saw it, even. I hate it when I am the one who sucks. So, thank you so much! I got uploading to do, if they would stop slapping me around at commons.--RaboKarbakian (talk) 17:03, 29 November 2021 (UTC)

out of order books

I don't know where to take this so, please allow me to just spew here. I had those two Night Before Christmas books, and the one I scanned was out of order by the words. I think it was in order, however, for the pictures. It is early in my day and I am not together enough to look at it.

The pictures in The Night Before Christmas (Rackham) don't match the words. I am thinking about redoing it, in my User space, not scan backed, so the pictures can go with the right words. It is a little jarring to my sensibility as it is.

Also, post transcribing, I think the 1915 version must have been beautiful, where this old (probably 1931) scan I used looks like a reassembled thing. This scan I made, is just another fuzzy proof of the earlier edition.</spew>--RaboKarbakian (talk) 13:46, 30 November 2021 (UTC)

@RaboKarbakian I am unclear if there is any action you'd like me to take. Inductiveload—talk/contribs 13:47, 30 November 2021 (UTC)

I don't know the reason I needed to type words about this, but I did. This was the best place to type them. I should have typed "thanks for the djvu", etc. The good djvu got me thinking. No action from you required. Thanks for the consideration.--RaboKarbakian (talk) 13:53, 30 November 2021 (UTC)

Add OCR to Index:An American dilemma the Negro problem and modern democracy (First Edition).pdf

Latest comment: 2 years ago10 comments2 people in discussion

Could you add an OCR layer to this text? It's slated for the December MC. Thanks. Languageseeker (talk) 04:40, 26 November 2021 (UTC)

@Languageseeker I can do it, but could you please do the page list first? DLI books are often pretty poor scans, so I'd rather not convert it and only then find that there are pages missing or something. Also the easiest thing to do for these is to re-import as a DJVU with https://ia-upload.wmcloud.org, which will also do Tesseract OCR on the way through, so the result will be the same as if I did it. Inductiveload—talk/contribs 17:06, 29 November 2021 (UTC)

Hmm, OK so actually it does not look like that import is going well at all! The DLI books are such a mess: they import the PDF and then the IA extracts to JP2, but because they're going "backwards" from PDF to JP2, the files end up encoding a ton of compression noise resulting in a >2GB tarball. I did think the IA import would work though, if it doesn't, that should be fixed. I'm downloading the PDF now (taking a while at 125kB/s): I'll convert/OCR it and upload if the IA-Upload does indeed fall over. But I'd still like a pagelist if you could :-) Inductiveload—talk/contribs 17:56, 29 November 2021 (UTC)

I decided to scrap the idea. It seems like to much work for a poor quality scan. HT also has scans of the same edition, but they are behind a protection wall. I've requested them to remove the protection. Let's see how that goes. Sorry for the bother. Languageseeker (talk) 21:40, 29 November 2021 (UTC)

@User:Languageseeker it's not especially hard to convert, just takes time to download. I don't think HT usually respond to such requests (oddly enough, Google are good about that) but do let me know if they do. In the meantime, I'll happily upload a DjVu from one of the DLI scans if you can do the pagelist and let me know if it's complete. Inductiveload—talk/contribs 21:48, 29 November 2021 (UTC)

Looks like the scan is missing some pages. It's probably best to let this idea rest. Languageseeker (talk) 22:00, 29 November 2021 (UTC)

OK, give me a ping if you get hold of a complete scan (or a set of scans than can be patched into a complete set) and I'll see what I can do. Inductiveload—talk/contribs 22:16, 29 November 2021 (UTC)

It turns out it was the opposite problem: four duplicate pages. I corrected the page list. I haven't flipped all 1,500+ pages, but a sampling indicates that the scan is complete. Languageseeker (talk) 05:30, 30 November 2021 (UTC)

OK, dupes removed and OCR added. File now at Index:An American dilemma the Negro problem and modern democracy (First Edition).djvu.

You do not usually need to flip every page - you can generally tell with good confidence that the scan is complete if the page numbering is correct at the end of the file. If pages are missing or duplicated, the page numbers will be out of step. In this case, you can tell the pages are probably correct because Page:An American dilemma the Negro problem and modern democracy (First Edition).djvu/1541 is correctly numbered as 1483. Of course, pages could still be jumbled or duplicates balance out missing pages, but it's very very rare for that to happen "perfectly" so that the pages still line up after the defects. Inductiveload—talk/contribs 10:14, 30 November 2021 (UTC)

Thank you! I might run it for Jan 2022 because December is quite crowded already. Thank you for the information about the pages. I'll keep it in mind for the future. Languageseeker (talk) 22:20, 30 November 2021 (UTC)

We need some stats, stat!

Latest comment: 2 years ago7 comments3 people in discussion

Well, or not so "stat". But somewhere (MC summary? Some diff I saw somewhere in any case) you referenced the phetools page stats as a point of reference for the MC stats. So before the whole matter drops from my frazzled mind, I thought I'd mention that I have on my plan doing some work on the stats code in phetools at some point in the not too distant future (maybe). Prime mover is improving the graphs in various ways, and secondary is cleaning up the way the stats are persisted (it's currently dumping a stringified Python datastructure to a text file, and reading and exec()ing it on next run). But once I go digging there may be opportunities for other improvements, such as anything the MC might need. If you give me a wishlist I can try to keep it in mind whenever I get around to that project (over the Christmas hols at the earliest, and absolutely no promises on anything). Doing all the MC stats in phetools would probably require more "on-wiki knowledge" than is sane to implement there, but anything generic / cross-project-applicable that would help or remove friction is fair game. Xover (talk) 06:41, 2 December 2021 (UTC)

@User:Xover that's a kind offer. I don't think the MC actually needs much in the way of stats support from Phetools, the bot is chugging along happily enough.*

What I do actively miss is the ability to get the "uplifts" for the whole wiki on a year-by-month and month-by-day basis. For example, if I want to check the figures for November, I have to check on 1st Dec (and even then, that's only actually accurate for 30-day months).

* The change-tag-based progress history API will make it much easier in future to get change histories for sets of pages, but that's stuck in review/deployment hell, so who knows.

As for getting the sets of pages, the API for getting all pages in an index and using them as a generator exists now (docs here), but hasn't been deployed this week as expected because the RelEng folks are "distracted" and nothing is being deployed. Inductiveload—talk/contribs 08:59, 2 December 2021 (UTC)

This is nice, it will simplify this a lot: https://github.com/wikimedia/pywikibot/blob/master/pywikibot/proofreadpage.py#L910, after adding support for the new API in pywikibot. Maybe during Xmas holidays ... :-) Mpaa (talk) 22:51, 2 December 2021 (UTC)

@Mpaa yep, that was part of the motivation. Also PWB can now access index fields as JSON which might help too (mw:Extension:ProofreadPage/Index data API). Inductiveload—talk/contribs 23:02, 2 December 2021 (UTC)

Ok, more work then! Backward compatibility, especially for the first one, will be needed to support old wikis. Mpaa (talk) 23:07, 2 December 2021 (UTC)

@Mpaa Out of interest, who is using ProofreadPage outside of the WMF deployment zone (and therefore are on older versions)? Also, if you have smart ideas about useful API stuff, there's a whole column on Phab for it, and I'll be happy to try to make a dream come true if I can. Inductiveload—talk/contribs 23:12, 2 December 2021 (UTC)

In practice I guess no one, but in my experience I always got comment about compatibility when adding stuff to PWB, as PWB supports a certain range of wmf-versions. Sure, I will keep in mind the API stuff. Happy to see the Extension hassome more people to help Tpt. Mpaa (talk) 23:25, 2 December 2021 (UTC)

Casing in {{smallcaps}}

Latest comment: 2 years ago2 comments2 people in discussion

Hello,

Thank you for the tip, it is not always easy to know the best way to make the text look good with all those models lying around. I am actually pretty proud of myself for remembering the existence of {{fraktur}}! ^_^ Ælfgar (talk) 21:11, 2 December 2021 (UTC)

@Ælfgar the learning curve is pretty vertical, isn't it? Just thought I'd let you you know sooner rather than later.

BTW, normally you should reply to messages where they are left, otherwise it's just confusing. In this case, just reply on your talk page and I'll see it in my watchlist. Or you can ping me with @[[User:Inductiveload]] and I'll get a notification (just like you will get when I save this, because I pinged you at the start of the reply. Inductiveload—talk/contribs 21:17, 2 December 2021 (UTC)

Indexes as atomic units

Latest comment: 2 years ago11 comments2 people in discussion

Another random drive-by drop of a thought unchewed…

That MC display of works with the progress bar under a pretty cover image and some metadata would be useful in several contexts. Think Featured Texts type things, or users' personal bragging rights on their user page. But since live querying for that is kinda gnarly… And conceptually in the same vein as the pagination API… If we view the Index more as an atomic unit, of a "collection" type, of which the Page: pages are members (rather than loosely-coupled references to external resources)… Perhaps it would make sense to track aggregate status in the Index, analogously to how a File page will tell you (and expose through the API) how many pages it has?

It wouldn't really be worth the effort for just a pretty display, but this "treating the index as a unit" thing has been nagging at me for a long time now and shapes other stuff. Like the Pagination API and what it enables.

I'm not sure it's a good idea, what it would actually be, its consequences, or its feasibility of implementing. But since you have your fingers deep down in the guts right now I figured I'd sorta seed the idea that might at some point sprout into… something. Xover (talk) 12:12, 4 December 2021 (UTC)

@Xover:, if I understand correctly, we do already have Lua access to "aggregate status" (i.e. the counts of pages of each status) via the Index object (this drives {{progress bar}}. We also have access to the index fields via Lua, which can be used for site-local stuff like transclusion status.

As for "real" API, we don't yet have index-level stats but it's on the list (and not too hard since the internals exist already). We do have index field access (in JSON, even) via Index data API.

We will also need more formal access for both Lua and API access to the Index data JSON, since that controls what index fields mean what. This can already be done with both Lua and API, but not in a dedicated implementation-independent way.

Does any of that sound like it might be co-opted for something useful? Inductiveload—talk/contribs 20:36, 4 December 2021 (UTC)

[ itym {{index progress bar}} :) ]

Modulo caching, mw.ext.proofreadPage triggers a DB query on the associated Page: pages every time :pagesWithLevel() is called, doesn't it? I was thinking more along the lines of a page property on the Index: page, that generally only triggers work when the Index itself is rerendered (edited/purged). Imagine the performance if, say, Billinghurst decided they wanted an MC-like display of the bragging list on their user page (have you seen that monster?!?), or any other context where it could legitimately get fed a "MC times ten"-sized number of indexes on a single wikipage.

The idea was that if the Index is conceptually an atomic unit that "owns" the Page: pages, it would make sense for it to actually track those counts (conceivably by edits to the Page: namespace updating the count when the level changes). That way it's a straight fetch of ~4 props from the Index, or a batch fetch of same for several Indexes. It may be waaay over-engineered for its utility, and the performance characteristics and scalability of the status quo may be far better than I imagine it to be, but, well, I rarely let insignificant little things like reality get in the way of a good technological philosophising. Xover (talk) 21:05, 4 December 2021 (UTC)

@Xover (phone posting so it'll be brief) actually it's only a single DB lookup for the stats of an index, since the pr_index table already maintains a running total of its pages. Also it's cached by the Lua interface too, so once you ask for a stat in a single page render, the others come for free.

The perf hot-spot is actually handling the tens of thousands of "dependency" pages (preview an MC page to see this in action), but since I fixed it last week we should be able to at least handle a hundred or so indexes on a single page render. In theory, we could add an "approximate" mode where the dependencies are skipped, at the cost of needing manual purging, since the page will no longer be sensitive to individual page status changes. Inductiveload—talk/contribs 21:29, 4 December 2021 (UTC)

Ah. So you're way ahead of me, as per usual, in that the index already tracks the status in the pr_index table. But creating a dep on, currently, 19592 pages just to display a pretty gallery of 52 works is pretty insane. Trying to edit that page is downright painful!

A "lazy" mode that doesn't update automatically seems pretty necessary, yes, and should probably be the default / strongly encouraged in the short term. If every Wikisource set up a MC patterned on ours we'd be talking a performance hit that would show up on the infrastructure graphs.

But, without any familiarity with the code, it seems to me that this is a problem that screams for a "push" style solution. Is it feasible to update the pr_index counts when a Page: page is saved? A if ($pageisnew) {update_pr_count();} if ($status_changed) {$old_level_total-- && $new_level_total++;} kind of thing, in the vicinity of the code that handles the page quality tags etc.? It'd create contention on pr_index, but only per index and that'd be, what, at most 10 people editing concurrently even for 1000+ page works (EB1911 and similar), so even a direct lock ought to be reasonable, and there has to be some kind of "one at a time" async queue mechanism in MW somewhere that could be used for this in a pinch. If async, the risk would be multiple status-changing edits to the same Page: page that get dispatched out of order, so there'd have to be some purge-time magic to update all in the brute force way, but I have trouble imagining that'd happen very often.

Hmm. In fact… this problem seems really similar to updating large categories, this case just has a much steeper curve (categories grow linearly, while we multiply by number of pages). Could we reuse some of the machinery, or just approaches, from the category code? Maybe not. Based on what little I know about how those are handled it seems plausible there are literal cron jobs running out-of-band maintenance scripts to update those. But I haven't seen the good old well-known update problems with cats in half a decade or so so they may have actually fixed that in a way that could be copied or co-opted for this purpose.

There are edge cases galore with such an approach, but nothing a brute-force-on-purge couldn't fix, I don't think.

Anyways… Enough whiteboard quarterbacking from me. I have some possible use cases for this, but no pressing need (and no time even had there been a need), so I'm just bouncing stuff off you to see if there's anything at all sticky down in this spaghetti bowl. I'm not expecting you to go rearchitect the world based on my half-arsed musings, is what I'm saying. Xover (talk) 08:22, 5 December 2021 (UTC)

@Xover But creating a dep on, currently, 19592 pages just to display a pretty gallery of 52 works is pretty insane. You're not wrong, but those deps are actually required if you want the progress bars to be "live". Otherwise a change to one of the 19592 (!) pages will not propagate to the progress bars. But for "finished" works, it's barely an issue if the progress bar is live. Actually, for a finished work, the progress bar is pretty pointless anyway: looking at the index work status field for Proofread/Validated status is more useful.

The pr_index counts are indeed updated when a Page: page is saved: that is why it is a single DB lookup to get the counts - the many updates are amortised across the many saves. However, there's no (current) way to "sensitise" a page like an MC gallery to changes on each of the possible pages in the indexes, existing or not, without adding a template dependency on the page. It would be better if there was a way, as you describe, to allow the index page to "push" to interested pages that the counts have changed rather than having to have the individual pages do so. Something is updating pr_index on save, however, I do not know if there's a mechanism for marking an update like that in Mediawiki, to invalidate renders, or if it's something an extension can do, or what. Inductiveload—talk/contribs 14:16, 5 December 2021 (UTC)

Oh, hmm, I see.

This is the inverse of the "widely used template" problem: instead of 100k pages depending on a single template, it's a single page depending on 100k "templates" (which the parser would bail on because it can't do it performantly and hence has hard limits for). There must be some facility for it because those 100k pages using the single template do get updated eventually. But it is not always reliable so null edits are sometimes needed (or was at some point needed, anyway).

So long as it's just the MC, and it doesn't grow significantly, you can maybe just about squeak by with this approach. But then it's a pretty specialised solution, and in any case the limitation ("do not use for more than ~50 works comprising more than ~10k pages") ought be glaringly documented. But that just makes me more convinced that your proposed "lazy" mode ought be the default, and if the "live" mode is even available it needs explicit parser limits built in. Bot-purging the MC pages a couple of times a day would seem to be a reasonable tradeoff (how fast do you really need those to update?), and the interactive performance on those pages is …. Well, the Performance Team would quite literally send the ninja squad in black helicopters to give you a talking to if it ever hit their radar.

But this brings to mind the task flying around Phab somewhere (Tech Decision Forum maybe?), triggered by the latest insanity from the WMF: Wikipedia of Functions. The grand idea seems to be outright algorithms stored in a dedicated wiki that are reuseable on other projects, so that, say, Wikipedia can refer to "that Wikidata query for demographics", "fed into this Wikipedia-of-Functions function to calculate a ten-year average", and spit out the result (but global/cross-wiki Modules/Templates are apparently an impossibility). Due to the obvious scalability issues they're discussing implementing this asynchronously and having the actual MW parser just spit out a placeholder that gets filled in by client-side JS (how they expect to handle the new obvious performance issues with that approach is beyond me).

That's years away from deployment as yet, but it might be one approach to explore. For a lot of the use cases for index status progress bars you don't actually need the pre-rendered cached page to have up-to-the-minute data; so maybe fire JS onLoad that asynchronously, but serially, updates the counts. You could even build in TTLs and caching to speed it up and avoid stampedes. Not as elegant as having it all handled by the parser, but right now it turns out the parser isn't exactly doing a graceful ballet with this either (the coughing and asthmatic wheezing is kinda spoiling the performance). Xover (talk) 15:01, 5 December 2021 (UTC)

Certainly it's a topic that needs more work, but the initial implementation seems "somewhat functional". Anyway, if they send the black helicopters for me, they can first consider that the only reason a complete nub who learned PHP a couple of months ago is even writing this shi...uh...stuff on their own and probably getting it hilariously and egregiously wrong is because "The" WMF is not interested in doing its own homework. So, yah boo sucks.

I have considered client-side loading for the MC stats in particular, but I'd really like to avoid that for various reasons including moving part count, bus factors, laziness, the dread prospect of getting that reviewed if we wanted all WSes to have it, etc. The current "bot updates a Lua table, which triggers a re-render" appears to be working tolerably well. And that, indeed is an "async" update: the stats can be a bit stale, currently by up to 2 hours, since that's the bot run frequency. We do not have any direct API to generate the data for that, which is why it needs a bot anyway.

If you think we should make lazy loading default, could you reply on phab:T297055 and we can centralise technical thoughts there and get Tpt and Co. looped in. Inductiveload—talk/contribs 15:21, 5 December 2021 (UTC)

Also a benefit of the dependencies for an active project, is that you can do this: recent changes for December 2021 MC.

So, if we had a "lazy" mode, it might still make sense to keep the current MC in "keen" mode, and then drop to lazy mode once it becomes archival (or even just drop the progress bars entirely). Inductiveload—talk/contribs 08:27, 6 December 2021 (UTC)

There's always Special:RecentChangesLinked that can be wrangled into showing changes related to some arbitrary subset of pages. Xover (talk) 08:12, 7 December 2021 (UTC)

That would require me to construct a coherent thought and do some actual research. Which sounds… ambitious. :) Xover (talk) 08:10, 7 December 2021 (UTC)

Of Periodicals and Pagelist

Latest comment: 2 years ago1 comment1 person in discussion

I saw your comment on Phab, but I thought that this might be a better place for a discussion. I understand your concerns for the gnomes. They work tirelessly to make keep this place running. However, when it comes to pagelists, I think that the benefits of having the periodical scans on WS outways the downsides of a backlog. Realistically, if we are to upload all the volumes of the general interest periodicals that are in the PD, they would amount to several thousand volumes. No one user can or should have to create the pagelists for all of them. Indeed, some of them will also require the finding and insertion of missing pages. With that being said, having these scans will enable users to proofread articles from them. The sheer difficulty of uploading periodicals causes users to either skip them or proofread them outside of WS and upload them as non-scan-backed versions. I also don't think that the WS model is that a user has to do all the work by themselves. It takes time to find the best scans for a work, more time to create the metadata, even more time to upload them, even more time to create a pagelist, and a huge amount of time to proofread the volume. Each one of these steps can be done by a separate user. Batch uploading the volumes without placing all the burden of creating the pagelists on either yourself or myself will enable more users to help out. Together, it will be done faster than either of us can do it alone. Languageseeker (talk) 02:10, 7 December 2021 (UTC)

John Curtis

Latest comment: 2 years ago3 comments2 people in discussion

Please check Author:John Curtis, ASAP. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:46, 14 December 2021 (UTC)

@Pigsonthewing well that was completely bizarre, it looks like it was chasing a circular redirect. Thanks for the heads up. Please check it's what you expect now. ASAP. Inductiveload—talk/contribs 21:53, 14 December 2021 (UTC)

Looks good now, thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:59, 14 December 2021 (UTC)

Module:PotM/data

Latest comment: 2 years ago14 comments3 people in discussion

Hi! Whenever I open any page of Index:The Origin of the Bengali Script.djvu, four transclusion tabs are displayed at the top (I have this feature installed). One tab is for the actual transclusion page, and the other three are Main Page, Main Page/sandbox and Main Page/sandbox2. And in those three pages, a Source tab is displayed at the top, pointing to the POTM index. It seems to me that Module:PotM/data behaves like a transclusion and as it is invoked on the Main Page and its sandboxes, our software treats the matter as transclusion. A user recently raised the issue in the global wikisource Telegram forum. Can you do something about this? Hrishikes (talk) 04:00, 19 December 2021 (UTC)

@Hrishikes This is the User:Inductiveload/jump to file script, right?

This is actually caused by the progress meter (I think), since that has a dependency on the pages, which counts as a "transclusion" even if there is no actual content presented. AFAIK this is a limitation of the MW core handling of parser dependencies: there is no way to declare a non-transcluded dependency. Or maybe there is, but I don't know of it.

I can work around it with a heuristic like "no sandboxes or Main page", which should cover 99% of cases (since there's nowhere else I can think of in mainspace a page could be "para-transcluded" to).

Also TIL Wikisource has a Telegram group, is that actually written down anywhere? Inductiveload—talk/contribs 11:31, 19 December 2021 (UTC)

No, I got it by MediaWiki:TranscludedIn.js. And the Telegram group for Wikisource Global Community is quite old; Wikidata, Commons etc. also have their own global groups in Telegram, and all these are very lively groups. You can answer the matter in Telegram, by going to 1. Hrishikes (talk) 13:33, 19 December 2021 (UTC)

This particular matter has been solved in Bengali Wikisource, by keeping the Main Page in Wikisource namespace, instead of NS0. That's why bnWS is 100% scan-backed (see at 2), but enWS cannot be, even theoretically, because of the presence of the Main Page in NS0. Hrishikes (talk) 13:45, 19 December 2021 (UTC)

@Hrishikes: Not that it has any particular relevance to the issue at hand, but enWS currently has 206 966 mainspace pages that are not scan-backed. Since 2019 we also increased our number of Page-namespace pages that are "Not Proofread" from about 500k to over one million (for comparison we have 1 434 979 that are "Proofread" and 524 976 that are "Validated"). In fact, over a week or two in March or April this year we increased this backlog by something like 150k pages (meanwhile, the Monthly Challenge is averaging something in the range of 2000–5000 pages per month processed; not only will it take more than a decade to scan-back the current backlog, but the backlog is growing way faster then we can reduce it). Just sayin´… Xover (talk) 14:32, 19 December 2021 (UTC)

@Xover: -- I don't know whether you are aware, but I don't work only in bnWS, I also work here, at least from time to time. So I am aware of the general picture, if not the exact statistics, of the situation described by you. But my point was theoretical, as clearly mentioned, that it is not theoretically possible to make this site 100% scan-backed as long as the Main Page is in the transclusion zone. Anyway, that was a secondary point only. The primary point was the "Source" tab displayed at the top of the Main Page, which came up for discussion in a global Wikisource forum. I am not techno-savvy, so I could not respond there. That's why I asked IL here. Regards. Hrishikes (talk) 15:52, 19 December 2021 (UTC)

@Hrishikes: I should apologise for butting in: it was an unrelated rant, you just happened to mention it just when I was looking at the latest depressing numbers. :)

I am indeed aware of your work here, and it is very much appreciated! Xover (talk) 17:02, 19 December 2021 (UTC)

@Hrishikes: You're cross-loading that script from mul:MediaWiki:TranscludedIn.js, so nobody here on enWS has edit rights to the script. But perhaps Candalua can help? It was written in 2012 and last updated in 2015 so it's probably ripe for some modernisation in any case (most scripts do, as a result of technological changes in the surrounding environment). Xover (talk) 07:53, 20 December 2021 (UTC)

@Xover: -- No need to edit that script. If you go to the Main Page, you will see the "Source" tab at the top, which is not dependent on user-script. That was the item that came up for discussion; and that source tab can be seen by anyone, without any script. The script cited by me just gives the reverse scenario from the Page: namespace, no need to do anything about that script. But can anything be done about the Main Page issue? Hrishikes (talk) 08:03, 20 December 2021 (UTC)

@Hrishikes: Ah. Yes, that is indeed a different issue; and that one would need to be fixed in the Proofread Page extension itself as that's where the script that adds the "Source" tab comes from. I'm not sure, though, whether that kind of special-casing would make sense; especially since the problem is currently enWS-specific (so far as I know).

@Inductiveload: Trying to generalise… Would it make sense to say that the Main Page (and its subpages) should never get a "Source" link since it is by definition not a PRP-managed page (cf. the arguments usually advanced for moving it to projectspace)? There are various other special-case rules for the Main Page littered around already, so it certainly wouldn't be unique to PRP. I'm not aware of any Wikisourcen that use PRP for their Main Page, and would hence by design want the "Source" tab to show up there, but I guess one can never know. Or is this another camel-tongue on the side of soft-deps/lazy mode? That probably wouldn't work very well for this use case, I don't think, since for that particular progress bar you'd presumably want relatively "live" updates. Xover (talk) 09:16, 20 December 2021 (UTC)

Indeed, the main page is special at quite a deep level, as there is Title::isMainPage() provided in the core PHP. So it should be a simple fix to check that when doing the source page on the server side in TransclusionPagesModifier.php.

As for the script, I'm not sure what the completely cross-subdomain way to detect a link to the Main Page is from the client side only. From the Main Page itself, you can check mw.config.get( 'wgIsMainPage' ), but if you're on some random page and merely holding a string that says "Hauptseite" on deWS, I'm not 100% what you do then. You can query MediaWiki:Mainpage for the title, but that's a bit of a hack (though trivially cached). I wonder if the best option might be as drastic as adding eiexclude=mainpage to API:Embeddedin to filter it out on the server side and provide the functionality for all API users with a similar corner case. Inductiveload—talk/contribs 09:35, 20 December 2021 (UTC)

Hmm. So an extra condition for the test in ProofreadPage::onOutputPageParserOutput(), along the lines of if ($outputPage->getTitle()->inNamespace(NS_MAIN) && !$outputPage->getTitle()->isMainPage()) {…}? I thought that happened in ext.proofreadpage.article.js, but I see now it just unconditionally adds a "Source" tab whenever it is loaded and there is a #ca-nstab-main present.

For JS, if one was to start futzing with the API code, wouldn't the more consistent approach be to expose Title::isMainPage() in mw.Title? Not that lists/generators couldn't use richer filters to push more of the stuff server-side, but… Xover (talk) 11:08, 20 December 2021 (UTC)

I think it's one of those things where the actual solution is to do both (actually: all three, there's the Lua mw.title API too). There are times when you want to be able to do it all client-side (e.g. when slinging strings about which don't come from an API query), there are times when you just want the server to hand you the Right Thing (TM) in the first place and there are times when you're in a template or module and you need it to be done at render time. Inductiveload—talk/contribs 11:38, 20 December 2021 (UTC)

I have absolutely no interest in joining another proprietary chat platform, especially one which requires me to register with my phone number and doesn't allow a secondary account like Telegram. As far as I am concerned, if people have anything to say publicly they can do it in such a channel: on wiki or in IRC (or in some channel bridged to one of those).

I don't know about that script, but I imagine the fix is the same as for "jump to file" (diff): omit the links that are "Main Page" or subpages thereof. Inductiveload—talk/contribs 16:11, 19 December 2021 (UTC)

Google OCR

Latest comment: 2 years ago3 comments2 people in discussion

HI, i have problems with Google OCR. Several times, not work (In my bot several pages have more 200 attemps to save a page with Google OCR). Now, in Spanish Wikisource and here, not work. I need to make several attempts (very much) to OCR works. I don't know if the problem is mine, or it is from the mediawiki system, you know? Shooke (talk) 15:26, 18 December 2021 (UTC)

I don't know, it's something between the mediawiki thumbnail service and the Google OCR service. The bug is tracked at phab:T296912, but I don't have any further clues to what's going on beyond what I wrote there (and I don't have access to the logs to figure out it out for sure). In the meantime, you could try the Tesseract OCR instead? Inductiveload—talk/contribs 15:53, 18 December 2021 (UTC)

thanks for answering. Regarding Google's OCR, it is very good with Spanish, tesseract has quite a few shortcomings for this language. December 19 was perfect, not errors. But today not. It seems that the error is intermittent. Shooke (talk) 02:38, 21 December 2021 (UTC)

Index:The Poems of John DonneVolume 2 - 1906.djvu

Latest comment: 2 years ago2 comments2 people in discussion

Hello. Index:The Poems of John DonneVolume 2 - 1906.djvu, founded by Inductivebot, has recently been nominated for speedy deletion because the File:The Poems of John DonneVolume 2 - 1906.djvu does not exist. The summary of the file page says "File description for failed upload, pending server-side upload". What is the problem with the upload and is it going to be fixed? --Jan Kameníček (talk) 22:28, 21 December 2021 (UTC)

@Jan.Kamenicek sorry that was a borked redirect I think. The files should have been like Index:The Poems of John Donne - 1896 - Volume 1.djvu. Deleted. Inductiveload—talk/contribs 22:33, 21 December 2021 (UTC)

Template:Auxiliary Table of Contents

Latest comment: 2 years ago4 comments2 people in discussion

Hello. It seems that something has gone wrong with the {{Auxiliary Table of Contents}}, see Page:Czechoslovak fairy tales.djvu/15. Is it possible that is is connected with this edit? --Jan Kameníček (talk) 12:33, 22 December 2021 (UTC)

@Jan.Kamenicek the issue is not related to that. The problem is that if you un-named parameters, whitespace is not removed. Lines that begin with whitespace are made into "pre" code-like blocks. You can fix it by removing the space before (shudder) the {{Dotted TOC page listing}}, or using the parameter name (1)

However, I personally think that an even better solution would be to use the wst-toc-aux class like this, since then you don't have to break the table up or use templates that don't export properly:

{{TOC begin|width=100%}}
|+ {{larger|CONTENTS}}
{{TOC row 2dot-1|class=wst-toc-aux|Note (not in original TOC)|vii}}
{{TOC row 2dot-1|Foobar|18}}
{{TOC end}}

CONTENTS
Note (not in original TOC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .		vii
Foobar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .		18

Inductiveload—talk/contribs 12:43, 22 December 2021 (UTC)

That solution looks good. Thanks, I think I will use it. However, the extra space did not cause any problems until recently, so hopefully there are no more pages like this where this problem has suddenly arisen. --Jan Kameníček (talk) 12:54, 22 December 2021 (UTC)

Page reworked, thanks for pointing this option out, I will remember it! --Jan Kameníček (talk) 13:09, 22 December 2021 (UTC)

A question about a deleted page

Latest comment: 2 years ago6 comments2 people in discussion

I have a question. I use anchors, unless there is a wikidata that can be filled with whatever is anchored. So, with that background I ask this:

Was this page attached to a wikidata? Early Spring in Massachusetts (1881)/Nuthatches-1854-02-24?--RaboKarbakian (talk) 16:26, 22 December 2021 (UTC)

It wasn't, but why on Earth would it be? It doesn't represent a discrete unit of text. If you need to reference it, it would be the edition Q-id (Q50824926), plus page(s) (P304), section, verse, paragraph, or clause (P958), line(s) (P7421), volume (P478), etc, and a reference URL (P854) that can include a fragment for the anchor. Spraying separate, reduplicated, transclusions of tiny snippets through the WS mainspace just so you can construct an item for it with a sitelink from Wikidata would be utter madness: putting the cart so far before the horse that you'd need another horse just to find it. Inductiveload—talk/contribs 16:42, 22 December 2021 (UTC)

I don't know. It as on my watchlist, it was about nuthatches (a lovely little bird with a terrible nasal song). It was deleted: so I don't know if I created it or added to it or moved it, and I don't know what was there even. So, I eliminated everything I don't care about and arrived at the one thing I do care about which is a wikidata link.

Reasons to make a page for a sitelink include: biological descriptions of species, genus, or family which, even if they are not the type species still get cited often. Mathematical formualas, especially in original proofs or as manipulated from the original for use. Recipes. I am sure there are more but this is off the top of my head right now. But anything that could be isolated from the text and stand alone as a "something". Type species, type genus and type families are first and foremost on that list. Biologists have been referencing type species since the 1600s and now that technology can really make the reference (not just an L1753) rules made by "English literature majors" with deletion tools (and other implements of distruction) prevent it from happening.--RaboKarbakian (talk) 18:08, 22 December 2021 (UTC)

@RaboKarbakian Most of those things are things that you make a reference for, not an item. A paragraph of an edition is not a standalone concept that would be modelled by a Wikidata item. It it able to be represented in a triple store, of course, just like anything can be, but, for the same reason that paragraph 67 of Spot the Dog Goes to the Supermarket, 2nd ed. does not have its own item, it is not worthy of a Wikidata Q-id.

Even if you did make a Wikidata item for your snippet (perhaps it's very famous like a verse of the Bible*), you should not make an orphaned, duplicated Wikisource page simply so you can have a sitelink for that Wikidata item. Wikisource's layout is not driven by any perceived need to fill in sitelink boxes at Wikidata. If you need to reference a specific text location at Wikisource for a WD reference, then you can use the reference URL (P854) and an anchor.

(*) Actually, even though they have a Genesis 1:1 item, they do not have an items for "Genesis 1:1 in the NIV/KJV/WTFBBQV". Inductiveload—talk/contribs 18:43, 22 December 2021 (UTC)

See commons:Category:Aporocactus flagelliformis at the top, it's TOL. The last item is the species name and behind that is a little L. That L. links to commons:Carl von Linné. They have been making that "link" since the 1600s, they being the biologists. It is there so the original can be found. There is another author behind that, as that species has had an updated and improved description. It could just as easily link to the text of the first (and updated) description of that if it were here. A text I worked on here that had a bunch of those "firsts" had them so embedded in the text that it was just easier to make a stand alone page for the species. And also, the first genus which needs the species, in that same book. It was an important book, a not so interesting portion of a series of actually interesting books, and the book as a whole is good to have.

So, I recommend that you spend some years with tree of life information and how it is set up, the different trees and the authorities and how the information fits into the several trees and branches out and then express your opinions about what qualifies as a good stand-alone or not. Or, skip that couple of years and trust someone who has done that. TOL is kind of interesting in the database management sense. Also, how much even science is politicized. I often had to decide to go with the name being used by the closest group to where the species was found. Like, New Zealand for Antarctica, etc. It is difficult for me to call anything I need to make a decision about a "science", but there you have it. Things need to have a name if they are to be discussed. But I am getting away from the point.... It would not be orphaned in the greater sense of the word. It is more likely that the complete book be orphaned due to not being able to link the important/specific parts at wikidata. If wikidata would accept anchored web links, this discussion would not be happening.--RaboKarbakian (talk) 19:06, 22 December 2021 (UTC)

@RaboKarbakian I don't understand your point. I understand Linnaean classification and biological authorities perfectly well, thank you. Commons having a category for Disocactus flagelliformis is not a surprise. The appropriate Wikisource sitelink for Disocactus flagelliformis (Q310976), if any, would be Portal:Disocactus flagelliformis, if it existed, and not some random paragraph in an 1881 book where it was mentioned. What you might do is link or reference some claim (maybe a reference on taxon name (P225) or described by source (P1343)) to the Wikidata item for a WS edition and qualify that with a page number, and a direct URL, perhaps with an anchor.

Your assertion about not accepting "anchored links" is incorrect: Wikidata can and does store full URLs:

References: reference URL (P854)
Items: described at URL (P973) and full work available at URL (P953), amongst quite a few others

Both of these will accept fragments (i.e. #...) as well as query parameters.

If you mean that sitelinks cannot have fragments, then you are correct, but, despite however many years of whatever it is, you may have misapprehended what a sitelink represents: it is not a shorthand URL, it is a statement that the sitelinked pages all relate to the same concept. The only item that could reasonably have a sitelink to a page Early Spring in Massachusetts (1881)/Nuthatches-1854-02-24 would have been an item about that exact paragraph. That hypothetical item would probably have, amongst others, main subject (P921) → Sitta (Q858577) (or maybe a specific nuthatch species if you knew which).

If you did actually have such a narrowly-defined item to fill a structural need (and you probably do not have such a need, but let's say you did), it still wouldn't justify either artificially splitting up WS works or creating a "shadow realm" of redundant transclusions at WS just because that would allow a sitelink at Wikidata. Inductiveload—talk/contribs 19:50, 22 December 2021 (UTC)

Batch Downloading from Modern Journal Project

Latest comment: 2 years ago3 comments2 people in discussion

As I was looking for The Smart Set volumes, I stumbled across [19]. Do you think that you could batch download from that site? The format for the pdf seems to be https://repository.library.brown.edu/studio/item/id/PDF/ For images, it appears to be https://repository.library.brown.edu/iiif/image/picture id/full/,3600/0/default.jpg . I think that if you use the range bdr:471301 - bdr:563458 for the image link, it should download all the images on MJP. The challenge will then be to assemble them into individual volumes. Languageseeker (talk) 18:15, 24 December 2021 (UTC)

@Languageseeker if there's actually a URL to get a document file from, you can just set source=url and then set the id to the URL in question. Inductiveload—talk/contribs 18:39, 24 December 2021 (UTC)

That said, I can (and will) add a "mjp" source to shortcut it. But it'll probably still just be the PDFs, or it's an order of magnitude slower and the MJP PDFs seem OK for quality and OCR.

Also, related plug: the hi-res loader already understands the MJP: e.g. Page:Camera Work No. 12 (October 1905).pdf/38. Inductiveload—talk/contribs 19:12, 24 December 2021 (UTC)

Batch Scraping of Sims sets

Latest comment: 2 years ago3 comments2 people in discussion

I've had my eye on All The Year Round for a while. From the experience with previous magazines, I'm not entirely sure if all the faffing about that results from trying to combine HT with IA. The IA has a complete Sims set of All the Year Round. However, it consists of 2,048 files. Major ouch. I was wondering if it would be possible to batch scrape the links. For every id, there is also entry_meta.xml where <volume></volume> maps to volume; <issue></issue> maps to issue or can be "CONTENTS" for Table of Contents; <date></<date> is a bit tricker because it can either be year-month-day or Text year (e.g. Christmas 1859). Do you think this is possible? Languageseeker (talk) 16:32, 26 December 2021 (UTC)

@Languageseeker you mean can you scrape the contents of that IA collection and construct the information for the upload (ie the same data as presented in the spreadsheets) from that? If so, it should be possible. The real question is are the SIM sets going to give us what we want, or would we rather prefer other sources and backfill from the SIM data when needed? Inductiveload—talk/contribs 23:04, 26 December 2021 (UTC)

Yes, in essence, it would take the transform the 2,048 into one excel file that can then manually be verified and fill out. I've thought quite a bit and we're basically dealing with SIMS vs Google scans in many cases. The Smart Set sims scan show that OCR produces excellent results which is all that matters. The SIMS have more background noise, but the Google have more aggressive reduction that can obsucure or remove parts of the images. The sims scans are also more likely to be complete. I think that in almost all the case, the SIMS set might actually produce better results with less scan repairs needed. Honestly, SIMS will work and save an enormous amount of labor from having to track down 2,048 issues.

BTW, it seems that The Smart Set sims batch upload failed at 80:4. Languageseeker (talk) 23:26, 26 December 2021 (UTC)

The Smart Set

Latest comment: 2 years ago8 comments2 people in discussion

Sorry about the mix up with the volumes. Must of been a product of tiredness. What should we do about the volumes that only exist as sim sets? Should I make a batch file to upload them individually or is there a way to combined the issues into volumes? Languageseeker (talk) 11:54, 23 December 2021 (UTC)

@Languageseeker It's OK, it happens. Do we have a complete volume of any of them? Otherwise maybe just leave them as they are (63:1, 64:1, 64:3) for now and "someone" can merge the indexes and pages when/if complete volumes turn up somewhere? Inductiveload—talk/contribs 12:18, 23 December 2021 (UTC)

I've looked and can't find them anywhere. I'll make a batch file to upload the individual issues in the next few days. As always, thanks for your help. Languageseeker (talk) 12:58, 23 December 2021 (UTC)

Okay, sounds good. You're welcome :-) Lippincott's is coming next. I'm getting my money's worth from the ISP this month!

In general, we can upload single issues perfectly well. Just set an issue heading and I'll make it work somehow. The bulk of the grind is downloading and converting the images in the first place, and uploading the file with some kind of sane-enough metadata. Combining multiple indexes can happen "later", if and when complete volumes deign to appear.

It's not really a huge issue if we have a bit of a mishmash of volumes and issues - it's just easier not to if vaguely possible. Manually recombining issues and faking volumes up is more work than just living with the mishmash. After all, after transclusion, it doesn't even show at all! Inductiveload—talk/contribs 13:08, 23 December 2021 (UTC)

So, it seems that the MJP pdfs are b&w, while the images are full-color. Do you think it's possible to download the images, make them into DJVU, and upload them? Languageseeker (talk) 23:35, 30 December 2021 (UTC)

I technically could, but it'll be a pretty huge amount of downloading and processing, plus I'll need to write a backend for the download script to scrape the IIIF manifest. The high-res loader works and as far as I can tell the magazine was printed in black and white anyway, so it's not really losing any detail. Images should never be cropped from the PDF (or DJVU) anyway. What's the use case here? Inductiveload—talk/contribs 00:17, 31 December 2021 (UTC)

Also at least some are in colour (I haven't looked into it further, but this one has a colour plate at page position 12 and all other pages are in colour: https://repository.library.brown.edu/studio/item/bdr:568723/PDF/) Inductiveload—talk/contribs 00:33, 31 December 2021 (UTC)

Alright, I don't want to pile on even more work on you. Happy New Years! May it bring you all the best! Languageseeker (talk) 02:12, 31 December 2021 (UTC)

p-p-p-p-p- p-wrapping everywhere

Latest comment: 2 years ago7 comments2 people in discussion

Sigh. P-wrapping is the bane of every attempt at sanity. All-one-stanza (p)poem spanning three pages, which needs LST due to intermixed textual notes, where the middle page ended up getting wrapped in a p tag (with attendant vertical spacing) unless I manually fudged the LST end tag onto the same line as the end of the ppoem. Might be worthwhile keeping in mind if mysterious "stanza breaks" start popping up. Xover (talk) 09:27, 25 December 2021 (UTC)

So the first day of Christmas is a paragraph in a parse tree is it? I'll keep my eyes half open for bad interactions but at some point I'm just going to grumpily shift the blame onto "Mediawiki" in general and gesture vaguely at "the parser". Merry everything! Inductiveload—talk/contribs 23:09, 26 December 2021 (UTC)

Heh heh. Merry merry to you too. And in this case it is most definitely MW's fault: there's nothing we can do on the content side to affect this, except pray someone will tackle T253072 (cf. T134469) eventually.

PS. If you have any suggestions on how to handle /164 and /165 I'm all ears. I really don't want to do them as dumb images-of-text, but I can't think of any way for us to even approximate that layout without full-on arbitrary webfont support. Xover (talk) 07:17, 28 December 2021 (UTC)

Any thoughts on this approach to /164 and /165? I'm not particularly happy with it, but it's the least worst one I was able to come up with. Xover (talk) 11:44, 30 December 2021 (UTC)

@Xover whoops sorry, I forgot to reply here. I really don't think there's much more we can do here. Even if we were to ship the modern equivalent fonts, it's still not quite the same as the original, and the exact form of the font is the content in this case. Even if you could channel the spirit of William Caslon though FontForge and generate a perfect reproduction font, we can't actually ship it. I'd say a nice clear image is as good as you can reasonably get.

BTW, if you want cold sweats about font reproduction: Portal:Typography#Specimen_catalogs. Inductiveload—talk/contribs 12:01, 30 December 2021 (UTC)

Yeah, no, I wasn't concerned with perfect fidelity: I'm not that geeky about fonts. But all those do have rough equivalents in modern computer fonts (most if not all available in free beer-ish variants) so in a perfect world… *sigh*

But, in any case, I meant the technical approach with the overlain transparent text to give cut&paste'ers and TTS systems something sensible to work with. Do $(".wst-iot-text").css("color", "red"); in your console to see what's going on. I may move this to {{iot}} ("Image of Text") if I decide it has sufficiently general applicability to these cases. Xover (talk) 12:34, 30 December 2021 (UTC)

@Xover oh I seeee, well it looks sensible enough to me. It works on export to an extent, at least: it works in Koreader, the text is reduplicated under the image in Moon+Reader. Inductiveload—talk/contribs 14:02, 30 December 2021 (UTC)

Add topic