(Redirected from Wikisource:S)

The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help.

The Administrators' noticeboard can be used where appropriate. Some announcements and newsletters are subscribed to Announcements.

Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 414 active users here.



Fix Template:Block rightEdit

I've noticed two issues with {{Block right}} and {{Block right/s}} which I believe should be addressed.

  • First, on {{Block right}}, I believe the handling of "gutter" is erroneous. For all the other parameters, the relevant CSS is added only if that parameter is present, but for "gutter", the CSS is added only if "offset" is present. It looks to me like a simply copy-and-paste error, but it produces unexpected results. Fixing should be low impact, as it would only affect pages with "gutter" and without "offset", which are currently not rendering as intended anyway.
  • Second, I think {{Block right/s}} should support all the same parameters as {{Block right}}, but it does not, even though they share the same doc page which implies that they are supported. Fixing again should be low impact, as it is strictly additive and should not change any existing uses of this template (unless they use these parameters in which case I would argue they are not rendering as intended).

I'd like to propose these two issue be fixed (and if this isn't quite the right sort of topic for a "Proposal", my apologies and I can move this down to general discussions). — Dcsohl (talk)
17:55, 28 April 2023 (UTC)Reply[reply]

@Dcsohl: Does this look like what you want? Template:Block right/sandbox, Template:Block right/s/sandboxCalendulaAsteraceae (talkcontribs) 04:18, 4 May 2023 (UTC)Reply[reply]
@CalendulaAsteraceae Oh yes, that looks quite nice. I like the reuse of the "code" so the two stay in sync! — Dcsohl (talk)
14:53, 4 May 2023 (UTC)Reply[reply]
  This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. — Dcsohl (talk)
16:48, 17 May 2023 (UTC)Reply[reply]

Bot approval requestsEdit

Repairs (and moves)Edit

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

See also Wikisource:Scan lab

Other discussionsEdit

Policy on substantially empty worksEdit

[This is imported from WS:PD, where it applies to multiple current proposals, and several other works].

We have quite a few cases of works that are "collective" or "encyclopaedic" in that they comprise many standalone articles of individual value, which are basically just "shell pages", with no substantial content of any sort, not even imported scans or Index pages. For example, and this isn't intended to make any statement about these specific works, they're just examples and they may well get some work done soon during their respective WS:PD discussions:

Based on the usual rate of editing for things like that, unless dragged up into a process like WS:PD, they'll remain that way a very, very long time. I think it is perhaps there might be a case to host a mainspace page for this work, even though there is zero, or almost zero actual content. Do we want:

  • Mainspace pages where this is a tiny bit of information like header notes, scan links and maybe detective work on the talk page (not in this case). This provides a place for people to incrementally add content. Also gives "false positive" blue links, since there is actually no "real" content from the work itself, or
  • Do not have a mainspace page until there's some content. Only host this in terms of scan links author/portal scan links, much like we do for something like a novel.

Personally, I lean (gently) towards #2, but with a fairly low bar for how much content is needed. Say, Indexes, basic templates, a title page and one example article. Ideally, a completed TOC if practical, especially for periodical volumes/numbers. It is fair to not wish to transcribe entire volumes of these work, it is fair to not want to import dozens of scans when you only wanted one, it is fair to only want an article or two, but it's not fair, IMO, to expect the first person who wants to add an article to have to do all the groundwork themselves, despite having been lured in with a blue link. That onus feels more like it should be on the person creating the top-level page in the first place.

I do see some value in periodical top pages with decent lists of volumes and scans where known, because these are often tricky and fiddly to compile from Google books/IA/Hathi, so it's not useless work, even if there are no imported scans (though imported is better than not).

We currently have a large handful of collective works listed for deletion right now in various levels of "no real content", and, furthermore, every single periodical that gets added can fall into this situation unless the person who adds, so I think we could have a think about what we really want to see here. Inductiveloadtalk/contribs 15:43, 3 July 2020 (UTC)Reply[reply]

  • I believe that, if there is no scan as an Index: page, the main-namespace page should not exist unless it is being actively completed or is already mostly completed. A few pages (of the volume itself) is not very helpful, and is entirely useless if their is no scan given. TE(æ)A,ea. (talk) 15:59, 3 July 2020 (UTC).Reply[reply]
  • I think such preparatory information would ideally be on more centralized WikiProject pages (for the broad subject), both for clarity and to assist in keeping different efforts consistent -- but that it certainly should be retained as visible to non-admins. I think that the red vs blue link issue is minor (but not totally negligible) and outweighed by the disadvantages of hiding the history of previous efforts. I strongly encourage redirecting such pages to appropriate WikiProject pages (after copying over the details there). JesseW (talk) 18:11, 3 July 2020 (UTC)Reply[reply]
  • @JesseW: I agree that history shouldn't be deleted, but I think we should approach this in terms of what we want to see from these works, rather than what to do with the handful of examples at PD. There are hundreds of periodicals we could have but don't, and this applies to those as well. If we can come to a conclusion about what is and isn't wanted, we can make all the deletion requested works conform to that easily enough. Inductiveloadtalk/contribs 20:55, 3 July 2020 (UTC)Reply[reply]
  • I think these pages are necessary to list index pages and external scans of multi-volume works (such as encyclopaedias and periodicals) especially if they are wholly or partly anonymous or have many authors or are simply large. I think it makes no difference whether such pages are in the mainspace, the portal space or the project space (except that it is harder to find pages outside the mainspace). The point is that these works often have so many volumes (often dozens or hundreds) that they must have their own page, and cannot be merged into a larger portal or wikiproject. If the community starts insisting on index pages, what will happen is the rapid upload of a large number of scans for the periodicals that already have their own page. Likewise if the community insists on transclusion. I also think it is reasonable to have a contents page in the mainspace, as it allows transclusion of articles. Most importantly, new restrictions should not immediately apply to existing pages that were created before the introduction of the restrictions. This is necessary to prevent a bottleneck. James500 (talk) 23:55, 3 July 2020 (UTC)Reply[reply]
move the works to a maintenance category, and i will work them; delete them and i will not: i find your sword of Damocles demotivating. Slowking4Rama's revenge 01:55, 5 July 2020 (UTC)Reply[reply]
@User:Slowking4: I am not proposing a sword of Damocles. I agree that the imposition of deadlines is counter-productive. I do not support the deletion of any of these pages. I would prefer to see them improved. James500 (talk) 04:38, 5 July 2020 (UTC)Reply[reply]
TEA is on his usual deletion spree. not a fan. will not be finding scans to save texts, any more. he can do it. Slowking4Rama's revenge 00:15, 6 July 2020 (UTC)Reply[reply]
The entire point of moving this here, and not staying at WS:PD is to decouple from the emotions that get stirred up in a deletion discussion. Let's keep deletion out of this. If we come up with some idea of what we do and don't want, then we can go back to WS:PD and decide what to do. I imagine that all that will be needed will be a fairly limited amount of housework to bring those works up to some standard that we can decide on here, and all the collective works there will be easy keeps. Hopefully with some kind of consensus that we can point at to outline a minimum viable product for such works going forward. There are hundreds and thousands of dictionaries, encyclopedias, periodicals and newspapers that we could/will, quite reasonably, have only snippets of. How do we want to present them? What, exactly, is the minimum threshold? Let's head of all those future deletion proposals off at the pass, because deletion proposals often cause friction. Inductiveloadtalk/contribs 00:47, 6 July 2020 (UTC)Reply[reply]
and yet deletion is the default method to "motivate" quality improvement. i reject your assertion that "emotions get stirred in a deletion discussion", rather, anger is a valid response to a repeated broken process being kicked down on the volunteers. it is unclear that a minimum threshold is necessary, rather a functional quality improvement process is. until we have one, you should expect to see this periodic stirring of emotions, as the non-leaders act out. Slowking4Rama's revenge 11:53, 9 July 2020 (UTC)Reply[reply]
@Slowking4: Thank you for presenting this opinion, and I'm sorry if I have not made myself clear. We do need to figure out how to avoid a de-facto process of using WS:PD as an ill-tempered ad-hoc venue for "forcing" improvements on people who have somehow managed to generate works that are so in need of improvement that another user has nominated them for deletion. Please also consider looking at #Re-purpose_WikiProject_OCR_to_WikiProject_Scans for an idea to have a "functional quality improvement process" to which such works could be referred upon discovery rather than kicking them straight to WS:PD. If you have other ideas or you have previously suggested something similar to address these frustrations, you could detail them there. Personally, I think we should always prefer improvement over deletion. Exactly what the remediation is (refer to a putative WP:Scans, WS:Scriptorium/Help, directly WS:PD as now, or something else) is not what this thread is for. This thread is for discussing, what, if anything, should be the tipping point for deeming a page "lacking" and doing something about, whatever "something" is. I don't think I can be much clearer that this is not about deletion. If we also have a better venue for improvements, then that's even better.
For example, my personal feeling and !vote on A Critical Dictionary of English Literature is "keep and improve", despite it lacking scans or even links to scans, having only one article and no other content, not even a title page: in short, failing almost every criterion suggested so far in this thread. The only thing it does have is have is good text quality of the one entry. I personally do not think this work should be deleted, but I do think it should be improved in specific ways. The first half of that sentence is not the focus of this discussion, the second half is. Inductiveloadtalk/contribs 14:18, 9 July 2020 (UTC)Reply[reply]
deletion threat has been an habitual method of communicating by admins since the beginning of the project. and text dumps have been habitual following in the guttenberg example. culture change and process change would be required to change those behaviors. we could may it easier to start scan backed works, but the wishlist was not supported. Slowking4Rama's revenge 21:00, 14 July 2020 (UTC)Reply[reply]

I don't think this needs to be much of an issue going forward -- we all agree that it's OK to create Index pages for scans, even if none of the Pages have been transcribed yet; so the only case where this would come up is recording research where no scan has yet been identified as suitable to be uploaded. And for that, I still think a WikiProject page is the right location, not mainspace. (Or, if you must, your userpage.) JesseW (talk) 00:59, 6 July 2020 (UTC) I realized I may not have been clear enough here -- in my view, the ideal process goes like this:Reply[reply]

  1. Decide on a work you are interested in (in this case, a periodical/encyclopedic one) -- don't record that anywhere on-wiki (except maybe your user page)
  2. Find and upload (to Commons) a scan of one part/issue/etc of the work.
  3. Create a ProofreadPage-managed page in the Index: namespace for the scan. (You can stop after this point, without worry that your work will later be discarded.)
    1. Put further research (on other editions, context, possible wikification, etc.) on that Index_talk page.
    2. Proofread a complete part of the scan (an article from the magazine issue, a chapter from the book, a entry from an encyclopedia, etc.) and transclude it to the mainspace (and create necessary parent pages), and put the further research on the Talk: page of the parent mainspace entry.

If you can't find any scan, and don't want to leave your working notes on your user page, put them on a relevant WikiProject's page.

If you come across such research done by others and misplaced, follow the above process to relocate it to an appropriate place, then redirect the page where you found it to the new location. That's my proposal. JesseW (talk) 01:08, 6 July 2020 (UTC)Reply[reply]

@JesseW: It's not clear to me in your above whether when you use the term "index" you refer to a ProofreadPage-managed page in the Index: namespace, or a general wikipage in the main namespace on which an index-like structure (and/or a ToC, or similar) is manually created. Could you clarify? --Xover (talk) 05:14, 6 July 2020 (UTC)Reply[reply]
I meant the namespace. Clarified now. JesseW (talk) 05:17, 6 July 2020 (UTC)Reply[reply]
  • Hoo-boy. Y'all sure know how to pick the difficult issues…
    My general stance is that: 1) scans and Index: (and Page:) namespace pages have no particular completion criteria to meet to merit inclusion, and can stay in whatever state indefinitely (there may be other reasons to get rid of them, but not this); and 2) the default for mainspace is that only scan-backed complete and finished works that meet a minimum standard for quality should exist there.
    That general stance must be nuanced in two main ways: 1) there must be some kind of grandfather clause for pre-existing pages; and 2) there must exist exceptions for certain kinds of works that meet certain criteria. I won't touch on the grandfather clause here much, except to say I'm generally in favour of making it minimal, maybe something like "No active effort to get rid of older works, but if they're brought to PD for other reasons they're fair game". The design of a grandfather clause for this is a whole separate discussion, and an intelligent one requires analysis of existing pages that would be affected by it. It is always preferable to migrate pages to a modern standard, so a grandfather clause is by definition a second choice option.
    Now, to the meat of the matter: the exceptions…
    We have a clear policy to start from: no excerpts. Works should either be complete as published, or they should not be in mainspace. But quite apart from the historical practices that modify this (which are somewhat subjective and inconsistent, so I'll ignore them for now), there are some fairly obvious cases that suggest a need for more nuance than a simple bright-line rule alone provides. The major ones that come to mind are: 1) massive never-completed projects like EB1911 or the New York Times (EB because it's big; NYT because new PD issues are added every year); 2) compilations or collections of stand-alone works with plausible claim to independent notability.
    For encyclopedias and encyclopedia-like things, we have to accept some subsets due to sheer scale of work. But when that is the grounds for exception, there needs to be some minimum level of completion. I'm not sure I can come up with a specific number of pages/entries or percentage, but it needs to be more than just a single entry (and, obviously, only complete entries). For this kind of exception to apply, I think it needs to be a requirement that the framing structure for it is complete: that is, the mainspace page should give a complete overview of the relevant work even if most of it is redlinks. That includes title pages and other prolegomena when relevant. For a periodical like the NYT, that means complete lists of issues with dates and other such relevant information (e,g. name changes etc.). For preference, these kinds of things should be in Portal: namespace or on a WikiProject page until actually complete, but that will not always be practical (EB1911 and NYT are examples of this). Mainspace or Portal:-space should never contain external links (i.e. to scans) or links to Index: or Page: space (except the implied link of transclusion and the "Source" tab in the MW UI provided by ProofreadPage).
    For exception claimed under independent notability there are a couple of distinct variants.
    Newspaper or magazine articles need to have a certain level of substance in addition to a specific identifiable byline (possibly anonymous or pseudonymous, and possibly identified after the fact by some other source, such as the Letters of Junius) in order to qualify. It is not enough to ipso facto be a newspaper article, a magazine article, a poem, or an encyclopedia entry. On the one hand we have things like dictionaries and thesauri, where an entry could be as little as two words. Or a one-sentence notice without byline in a newspaper. Or two rhymed lines (technically a poem) within a 1000-page scholarly monograph.
    To merit this exception it should be reasonable to argue that the "work" in question should exist as a stand-alone mainspace page (not that we generally want that; but as a test for this exception, it should be reasonable to make such an argument). This would clearly apply to moderately long entries in the EB1911 written by a known author that has their own Wikipedia article. It would apply to short stories or novella-length serialisations in literary magazines by authors that have later become famous (or "are still …"). It would apply to various longer-form journalistic material from identifiable journalists (again, rule of thumb is notable enough for enWP article), including things in magazines that have similar properties. For most periodicals the most relevant atomic (indivisable) part is the issue not the entry or article, but with some commonsense exceptions.
    It would, generally, not apply to things that are works by a single author, like a scholarly monograph that just happens to be arranged in "entries" rather than chapters. It would not apply to things that are essentially lists or tables of data. It would not apply to short entries in something encyclopedia-like or entries that are not by an identifiable author. The OED for example, iirc, is a collective work where entries are by multiple not individually identifiable authors (and each entry is mostly very short too); only the overall editor is usually cited.
    For works claiming this exception too the framing structure should be complete, even if most of it are redlinks. The same general rules about Portal:/WikiProject and no external or Index:-space links apply. An exception would be for periodicals where new issues enter the public domain every year; and we should generally avoid including even redlinks for the non-PD issues here (but may allow them in a WikiProject page). For non-periodical works in multiple volumes where some volumes were published after the PD cutoff, including listings for the non-PD volumes (but not links to scans; those are a copyvio issue) is ok.
    Poems, short stories, and novellas are a special class of works here. A lot of these were first published in a magazine (possibly serialized), and a lot of them exist as multiple editions in substantially the same form. Some exist in multiple versions. These should all primarily exist the same way as chapters as part of their various containing works; but there are some cases where we might want to have, for example, a series of connected pages of the poems of Emily Dickinson. I am significantly ambivalent about this practice, as it amounts to making our own "edition" or "collection" of her poems (in violation of several of our other policies), but I acknowledge that it is an established practice and it is something that has definite value to our readers. It may be that it is actually a practice that should be governed by its own dedicated policy rather be attempted to be handled within these other general policies.
    For the sake of example; applying this to the works Inductiveload listed at the start of this thread would shake out something like this:
    Auction Prices of Books—This work appears to have no sensible subdivisions and is in any case by a single author. I see no obvious reason to grant this work an exception, except under sheer volume of work and even there I would want to see both a substantial proportion completed and some kind of ongoing effort towards completion (no particular time frame, but definitely not infinite and definitely not as an effectively abandoned project). In a deletion discussion I would very likely vote to delete the mainspace pages here (but, as nearly always, to keep the Index: and Page: namespace artifacts). I don't see this as a reasonable candidate for a Portal:, nor really a good fit for a WikiProject (though I probably wouldn't object to a WikiProject if someone really wanted one).
    Central Law Journal/Volume 1—A single volume is too little, so I would want to see a complete structure for the entire Central Law Journal, with level of detail for each volume similar to the one existing volume. Each article in the journal can be individually considered for a stand-alone work exception; but for the collection I would want to see at minimum a full issue finished to justify having the mainspace structure, and preferably multiple issues (in a deletion discussion I might insist on multiple issues). Index: and Page:-space artefacts can, of course, stay. A Portal: might make sense for selections from the journal, of articles that meet the standalone work exception. A WikiProject to coordinate work and track links to scans etc. might be a decent fit here, if someone wanted that. As it currently stands I would probably vote delete for the mainspace artefacts (with option to move whatever content has reuse value to a non-mainspace page for preservation; and undeleting if someone wants to work on something is a low bar).
    A Critical Dictionary of English Literature—The top level mainspace page has near-zero value, existing only to link to the single transcribed entry. For a credible claim to exception to exist it would need to be a complete framework for the work as a whole, and significantly more than a single entry must be complete. I would probably also want to see ongoing work, unless a substantial percentage of the entries were complete. The single finished entry is eligible to claim a standalone work exception, but I think it probably would not meet my bar for that (I might be wrong; and the rest of the community might judge it differently). In a deletion discussion I would probably vote to delete all the mainspace artifacts here (as always keeping Index:/Page: stuff) but with a definite possibility that I might be persuaded on the one completed entry (an absolute requirement for convincing me would be to scan-back it: as a separate issue, my tolerance for grandfathering of non-scan-backed works is small, and effectively zero for new/non-grandfathered works).
    Bradshaw's Monthly Railway Guide—Would need a full framework and a number of individual issues finished to merit a mainspace page. I see no credible subdivisions for a standalone work exception, but might be persuaded otherwise if, say, one of the train tables was used as a (reliable primary) source in a Wikipedia article (implying some sort of notability beyond just being raw data). In a deletion discussion I would probably vote to delete all mainspace artifacts here. If anyone made the argument, I would entertain the notion that there is value in treating train tables like poems, and hosting a series of train tables like we do Dickinson's poems; but that would require a substantial number of them completed.
    For everything above my stance is nuanced by a willingness to accept temporary exceptions for things that are actively being worked: active being operative, but with no particular deadline to complete the work. We have differing amounts of time available, and some works are so labour-intensive or tedious to do, that my person threshold for "active" is a pretty low bar to clear. If it's months and years between every time you dip in and do a bit I might start to get antsy, but days or weeks probably won't faze me. And that the projected time to completion is very long at that pace is not particularly a problem so long as it is not infinite. Within those parameters I would always tend to err on the side of letting contributors just get on with it in peace, regardless of any of the policy-like rules sketched above.
    I also want to emphasise that I think this is a very difficult issue to deal with. There are a lot of competing concerns, and a lot of grey areas that will likely take individual discussions to resolve. My balance point on this issue is partly formed by a broader concern about our overall quality (we have waay too many works of plain sub-par quality, and too many not up to modern standards) and a hope that by preventing the creation of these kinds of works (rather than deleting them after creation) we will be able to retain the good and desirable exceptions without dragging down quality, and without the traumatic and stressful events that deletions and proposed deletion discussions are.
    And for that very reason I am grateful this issue was brought up here for discussion, and I hope we can end up with some clear guidance, possibly in the form of a policy page, going forward. And in any case, since it will create de facto policy, this is a discussion that needs to stay open for a good long while (there are several community members that have not yet commented whose opinion I would wish to hear before closing this), and depending on how well we manage to structure the consensus, may also require a formal vote (up in the #Proposals section). --Xover (talk) 09:03, 6 July 2020 (UTC)Reply[reply]
  •   Oppose. It is becoming clear that a policy on incomplete works in the mainspace is going to place enormous pressure on individual editors. I think it would be more effective to start a wikiproject devoted to scan-backing works that lack scans and so on. James500 (talk) 12:14, 6 July 2020 (UTC)Reply[reply]
    • @James500: FYI, this thread was made in order to provide an exception to the current policy of "no excerpts". A literal reading of the policy as it stands has a plausible chance of coming down delete on the mainspace pages over at WS:PD. This thread is a chance to come up with a better way to support such partial collective works. That we have several substantially incomplete and abandoned collective works lolling around in mainspace is actually the result of laxity in respect to stated policy (not to say I think it's a bad thing). The deletion proposals, whatever you may think of them, are actually not in contradiction to policy. That said, as always, there is scope to adjust policy. Which is what this is.
    • Now, in terms of a WikiProject to scan back works, I think that is a good idea. See #Re-purpose_WikiProject_OCR_to_WikiProject_Scans above, which proposed to reboot Wikiproject OCR as a scan-backing Wikiproject. Inductiveloadtalk/contribs 14:40, 6 July 2020 (UTC)Reply[reply]
      • The policy says "When an entire work is available as a djvu file on commons and an Index page is created here, works are considered in process not excerpts." A literal reading of that policy is that no scan-backed work is an excerpt (it is expected to be completed eventually). Further the policy refers to "Random or selected sections of a larger work". A literal reading of that expression is that it does not include lists of scans, or auxilliary content tables, as they are not "sections" (they are not part of the work), and that not every incomplete portion of a work is either "random or selected" (which would not include starting from the beginning and getting as far as you can, with intent to finish later). I could probably argue that an encyclopedia article or periodical article is a complete work. James500 (talk) 15:16, 6 July 2020 (UTC)Reply[reply]
  • Nice wall of text, Xover (and I say that with great respect!) -- it generally makes sense and sounds good to me. As another hopefully illustrative example, take The Works of Voltaire, which I've been digging thru lately. I think this would very much satisfy your criteria as a large work, with sufficient scaffolding to justify the mainspace pages that exist for it. I would love to hear others thoughts on that. JesseW (talk) 16:07, 6 July 2020 (UTC)Reply[reply]
    @JesseW: Yeah, apologies for the length. Brevity is just not my strong suit.
    The Works of Voltaire probably qualifies on sheer scale of work, yes. I don't think the current wikipage at The Works of Voltaire is quite it though: as it currently stands it is more WikiProject than something that should sit in mainspace (its contents are for Wikisource contributors, to organise our effort, not our readers, who want to read finished transcriptions). It also mixes a work page with a versions page in a confusing way. So I would probably say… Move the current page to Wikisource:WikiProject Voltaire; create a new The Works of Voltaire as a pure versions page, linking to…; The Works of Voltaire (1906), that is set up as a work page with the cover and title (and other relevant front matter) of the first volume, and an AuxTOC (and possibly also the {{Works of Voltaire}} volume navigation template). I don't know how tightly coupled the volumes of this edition are (does the first volume have a common ToC or index of works for all the volumes?), so some flexibility on format may be needed to make sense. But as a base rule of thumb it should start from a regular works page and deviate only as needed to accommodate this work (mainly the size is different).
    In any case… With a volume or two completed (they're only ~350 pages each) I'd be perfectly happy having something like that sitting around. With less then that I'd possibly be a bit more iffy, but it's hard to put any kind of hard limit on that. And with somebody actively working on it I'd be in no hurry whatsoever regardless of current level of completion.
    PS. I'm pretty sure a large proportion of the contents of these volumes are works that would qualify under "standalone works" that could exist independently in mainspace, regardless of what's done with the The Works of Voltaire page. Even his individual poems and essays can presumably make a credible claim here (because it's Voltaire; less famous authors would have a higher bar). Better as part of the edition, but also acceptable on their own. --Xover (talk) 16:56, 6 July 2020 (UTC)Reply[reply]
  • @JesseW: I personally take no issue with this page's existence (actually I think it's a nice work and good way to allow an important author's works to be slotted in piece-by-piece. I have some general comments which overlap with this thread (written before Xover's reply, so pardon overlap):
    • First off, I differ with Xover in terms of the scan links: I think they're better than nothing, and I don't see much value in duplicating the volume list onto an auxiliary page just to add scan links. However, I can sympathise with the sentiment that our mainspace shouldn't direct users off-wiki (or at least off-WMF). But if we don't have the scans, and that's what the user wants, they're leaving anyway. Real answer: import moar scans!
    • No scan links are necessary where the volume exists in mainspace and is scan-backed (e.g. v3)
    • Ext scan links should only be used when there is no Index page or imported scan. Use {{small scan link}} or {{Commons link}} when possible (e.g. v2)
    • The first volume list could probably be in an AuxTOC to mark it out as WS-generated content.
    • The "Other editions" section belongs on an auxiliary namespace page (Talk, Portal or Wikisource). I suggest the Talk page is best in this case. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)Reply[reply]
  • @Xover: I am in agreement with the majority of what you say. Particularly, I think a framework around any collective work (be it a single-volume biographical dictionary or a 400-issue literary review spanning 80 years) is the critical prerequisite, plus at least some scans, the more the merrier. Where I think I differ:
    • I am inclined to be a bit more relaxed in terms of how much of a work we need. As long as a single article exists, it's not "trivial" (e.g. only a short advert or some incidental text like a "note to correspondents", as opposed to an actual article), it's well-formatted and scan-backed, and a complete framework exists, including front matter and a TOC, such that's it is easy for anyone to slot in new pieces, I'd be fairly happy. Lots of periodicals have all sort of tricky bits like tables of stocks or weather tables and writing into policy that those must be proofread in order to get the "real" articles into mainspace would be a chilling effect, in my opinion. If you allowed an exception, it would be verbose and tricky to capture the spirit without saying "unless, like, it's totally, like, hard, man".
    • I am not dead against scan links in the mainspace at the top level, when such a top-level page exists. See my comments on Voltaire above. I am against them where they could sensibly be on an Author page and they are the only mainspace content.
    • I am ambivalent on the presence of, e.g., disjointed train timetables. It's not my thing to have a smattering of random timetables, but as long as they're individually presented nicely, it's not too offensive to my sensibilities. I might question the sanity of someone who loves doing tables that much, but whatever floats the boats! Also, I think that this might circle back to "good for export" - a mark which certainly would require completed issues or volumes. If you want to get that box ticked, you have to do it all.
    • Re the "notability" aspect of individual articles, I'm not really bothered by that, as I don't think we'll see a flood of total dross because few people really want to take the time to transcribe 1867 articles about cats in a tree from the Nowhere, Arizona Daily Reporter, and, actually I think some of the "dross" can be quite interesting in a slice-of-life kind of a way (always assuming well-formed and scan-backed). And the real dross is usually so bad (no scans, raw OCR, etc) that it can be dealt with outside of this topic. I think part of the value of WS is the tiny, weird and wonderful, not just in blockbusters like War and Peace and Pultizers. I think I might like to see more of our articles strung together thematically via Portals, but that's another day's issue. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)Reply[reply]
      • @Inductiveload: We appear to be mostly in agreement. But… instead of me dropping another wall of text on the remaining points of disagreement, maybe that means we're in a position to try to hash out a draft guidance / policy type page with the rough framework? Then we could go at the remaining issues point by point. Because I think I'm in with a decent chance to persuade you to my point of view on at least some of them, but this thread is fast getting unwieldy (mostly my fault). It would also probably be easier for the community to relate to now, and much easier to lean on in the future. --Xover (talk) 18:31, 6 July 2020 (UTC)Reply[reply]
        • @Xover: If there are no more comments forthcoming after a couple of days, I think that makes sense. I don't want to railroad it: considering we have at least one !vote for "do nothing", I'd like to see if there are any other substantially different opinions floating about. Inductiveloadtalk/contribs 17:41, 7 July 2020 (UTC)Reply[reply]

The quantity of text here has grown far faster than my ability to absorb it, so rather than continue to put it off, here's my position: I don't see any problem with transcriptions that are scan-backed, even if the transcription only covers a small fraction of the entire scan. If Sally chooses (say) to transcribe a favorite story, that happened to be published in an issue of Harper's back in the 1890s, and goes to the trouble of uploading the full issue, but only creates pages for the one story that interests her, I think that's great. It doesn't matter to me whether she intends to work on the other pages or not. If it's not scan-backed, but it's fairly high quality, I am personally willing to do some work trying to locate a scan and match it up to the text; I'd rather we take that approach, than deletion, though of course deletion is the better option in some cases where the scan is very hard to come by.

If all this has been said above, or if I've misunderstood the topic, my apologies. Please take this comment or leave it, as appropriate. -Pete (talk) 02:00, 8 July 2020 (UTC)Reply[reply]

Apologies, I see I had missed the point.

I disagree with Xover's statement that a top-level page for a publication, with a link only to a single article within the publication, has "near-zero value." Such a page can serve an important function linking content together in ways that help the reader (and search engines) find the content they're looking for, or understand the context around it. For instance, A Critical Dictionary of English Literature is linked from the relevant Wikidata entry. The banner on the Wikisource page clearly tells a Wikisource reader that they won't find a full transcription here; and with a simple edit, it could link to a full scan on another site, or (with perhaps a little more effort) even transcription links here on Wikisource. This page has been here since 2010; we don't have any way of knowing what links might have been created elsewhere in the intervening decade. (I do think that new pages like this should not be created without a scan at Commons to be linked to.) -Pete (talk) 02:12, 8 July 2020 (UTC)Reply[reply]

I'm really bad with walls of text, so I have only read a tiny portion of the above discussion. But I want to mention a couple of things that I think are worth considering in this discussion.
  • Most of the time, a mainspace "work" that is only a table of contents, but which has none of the actual content, and is not actively being worked on, can be (and should be) deleted as No meaningful content or history under our deletion policy.
  • A mainspace work that has only a little bit of content, but that content is a work unto itself within the scope of Wikisourse, should be kept. Most periodicals are like this. For an example, see the Journal of English and Germanic Philology which only has one hosted article, but that hosted article is scan-backed and firmly within scope.
  • On some occasions, empty mainspace works do have value. I ended up creating the page The Roman Breviary, depsite containing no actual content, mostly because there are a lot of works that link to it, using many different titles, and if someone uploaded a copy of the work under one title then many of the links would remain red because they point to different titles of the work. This could be easily solved by creating redirects to a simple placeholder page, so I did. I tried to make the placeholder page as useful as a placeholder page can be, as it contains useful information about the history and authorship of the work, and links to the Index pages where the transcription will take place.

Anyway those are my 2 cents, sorry if they are redundant —Beleg Tâl (talk) 00:40, 29 July 2020 (UTC)Reply[reply]


Since there has been no extra input for a month, and not wanting this section to get archived without at least attempting a proposal, I have started a proposal #Collective work inclusion criteria above. Inductiveloadtalk/contribs 11:00, 25 August 2020 (UTC)Reply[reply]

Since the proposal has now slipped off the main page (to here), with vague support for the first part (collective work inclusion criteria) and a fairly consistent opposition to the second (no-content pages), my plan is to transfer the first part, as guidelines rather than policy, to Wikisource:Periodical guidelines. As non-binding guidelines, they can then be worked on further in situ. Sound OK? Inductiveloadtalk/contribs 08:10, 16 April 2021 (UTC)Reply[reply]
The example given in Wikisource:Periodical guidelines might be improved, PSM is and was an exercise that has gone its own way (no offense to @Ineuw:, this is a site under development and that is only one example).CYGNIS INSIGNIS 13:05, 17 April 2021 (UTC)Reply[reply]
@Cygnis insignis: You would be wrong to think that I am offended. Remember that when I started, I knew everything. By now, so much of that knowledge is lost that I am happy to listen. Would you elaborate please? — Ineuw (talk) 19:50, 17 April 2021 (UTC)Reply[reply]

I've created Bradshaw's Monthly Railway and Steam Navigation Guide (XVI) - it couldn't be done on one page, due to the very high number of template transclusions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 1 September 2020 (UTC)Reply[reply]

@Pigsonthewing: The links in the toc on that page appear non-functional. Also, depending on just exactly which templates were the culprit, it is possible that you may be able to put all the content you wanted onto one page now due to some recent technical changes (template code moved to a Lua module which drastically improves performance and prevents hitting transclusion limits until much later). Xover (talk) 11:17, 14 September 2021 (UTC)Reply[reply]
Create the Draft namespace to hold substantially empty works? Then delete if no improvement after months?--Jusjih (talk) 19:22, 1 November 2021 (UTC)Reply[reply]
The issue is that the "substantially empty works" can have useful and complete content that stands alone. For example, an article from a scientific journal.
I would not want to see that either shunted into a Draft namespace to rot or deleted a few weeks down the line.
Index and Page namespaces provide our long term staging areas, and works can and do remain unfinished there for years. But what do we do when a self-contained piece of a larger work is ready? Inductiveloadtalk/contribs 20:29, 1 November 2021 (UTC)Reply[reply]

Subscribe to the This Month in Education newsletter - learn from others and share your storiesEdit

Dear community members,

Greetings from the EWOC Newsletter team and the education team at Wikimedia Foundation. We are very excited to share that we on tenth years of Education Newsletter (This Month in Education) invite you to join us by subscribing to the newsletter on your talk page or by sharing your activities in the upcoming newsletters. The Wikimedia Education newsletter is a monthly newsletter that collects articles written by community members using Wikimedia projects in education around the world, and it is published by the EWOC Newsletter team in collaboration with the Education team. These stories can bring you new ideas to try, valuable insights about the success and challenges of our community members in running education programs in their context.

If your affiliate/language project is developing its own education initiatives, please remember to take advantage of this newsletter to publish your stories with the wider movement that shares your passion for education. You can submit newsletter articles in your own language or submit bilingual articles for the education newsletter. For the month of January the deadline to submit articles is on the 20th January. We look forward to reading your stories.

Older versions of this newsletter can be found in the complete archive.

More information about the newsletter can be found at Education/Newsletter/About.

For more information, please contact spatnaik

Hi, Does anyone know where scans of this journal could be found, including under the former names of w:The Single Tax and w:Land Values? Thanks, Yann (talk) 21:51, 16 April 2023 (UTC)Reply[reply]

@Yann: Why not email them and ask? I'd reckon they'd know if anyone has them available, or has scans. — billinghurst sDrewth 05:40, 20 April 2023 (UTC)Reply[reply]
Thanks for your answer. Yes, that was the right thing to do. All issues are available online. I am not sure about the copyright status in UK. Which issues could be uploaded to Commons? I got the first one: Index:The Single Tax, Vol. 1 - June 1894-May 1895.pdf. Yann (talk) 14:37, 21 April 2023 (UTC)Reply[reply]
Portal:Land&Liberty. Yann (talk) 14:54, 22 April 2023 (UTC)Reply[reply]
@Yann: If the copyright is uncertain, then upload locally, and put notes on the Index_talk: page for things to be checked and moved once certainty is established. Probably can put notes on the portal page too. Easier to upload here and migrate to Commons, than the reverse. — billinghurst sDrewth 23:32, 17 May 2023 (UTC)Reply[reply]

User pages, and the over-interventionEdit

I would like to suggest to our community members that people's user pages, and somewhat their user talk pages, are having other users too busily editing those pages for no clear reason. If a user has an error,[1] a red link, or whatever on their user page which has no flow on impact, that is usually only the business of that user. If one is concerned about that error/red link/whatever, then the best cause of action is to mention it the person on their user talk page. If they fix it, great; if they don't, so be it, move on. Our presentation pages are those that are within the default for the search engines as defined by content in the respective namespace.

I don't think that we have to have policy or written direction about this, it is simply the courtesy to let people have and manage their pages in those aligned namespace.

  1. in user ns, and only impacting user: ns, no flow on to the community

billinghurst sDrewth 06:05, 27 April 2023 (UTC)Reply[reply]

I agree with the general principle of leaving user pages alone, and that for active users the first stop is a message on their talk page, both as a courtesy. However, I think that for straightforward technical fixes (the kind you'd do with a bot, say) we shouldn't fuss so much about User: namespace pages being holy. For example, the syntaxhighlight extension switched from the tag name "source" to "syntaxhighlight" several years ago, and is now adding tracking categories for uses of the old name prior to dropping support for it. Running a bot through to replace the tag name should be entirely uncontroversial: this is a public and collaborative wiki, not your social media profile, so your User: space is only personal to the extent that it serves the project (cf. why you can't put just anything there: copyvios, advertising, attack pages, etc. are no more permitted in User: space than anywhere else). It's a matter of not interfering with a user's User: pages, more than not ever editing them.
And for most of the maintenance and tracking categories it is necessary to remove the backlogs of trivial stuff so that real issues can be detected and fixed, and so that newly created issues can be detected. Cf. the CommonsDelinker issue and Category:Pages with missing files. The LintErrors fixes are kind of borderland in that there are so many of them and they usually have no visible effect currently, and are there mostly to prepare the ground for changes to MediaWiki's parser in the future. But as with most things I think we should not get religious on either side of this issue: User: namespace pages should be approached with respect and courtesy for the contributor, but are in no way sacrosanct and exempt from necessary maintenance; and maintenance and tracking cats should be processed and worked on, but not at any cost and not all issues are equally important.
Similarly with /Archive pages: in general these should not be edited, and if they must be edited then the change should change the historic page as little as possible. But if whatever maintenance issue it is is non-trivial then, yes, we do need to edit even /Archive pages. Its original state and the change made to it will be visible in the revision history so it's not that big of a deal. "Avoid it when possible, but do it if you must." Xover (talk) 07:08, 27 April 2023 (UTC)Reply[reply]
If we are looking to run uncontroversial clean-ups, then we should have that as a general open point of reference/conversation/background. This then also has a diff that can be part of any summary. Noting that I did focus on user pages, and I did qualify it for being no flow-on to the community. And yes it is a collaborative wiki, however, user ns is less collaborative, it is the one space that should have a really good reason to edit; and that is not what I am addressing. There are enough public facing and important issues that need fixing, so keeping out of user ns is desirable where possible. — billinghurst sDrewth 07:31, 27 April 2023 (UTC)Reply[reply]

Do we want CommonsDelinker bot on enWS?Edit

CommonsDelinker is a bot (running without bot flag and without bot approval, incidentally, which we should fix either way) that checks file deletion logs on Commons and then removes now-broken references to that file on other projects. The idea is to prevent red links and broken images visible to readers. On Wikipedias this is a great idea. However, on Wikisource this frequently happens in Page: namespace pages, like this, where they will usually go unnoticed. Our page editing pattern is such that any given Page: namespace page will only have one, or at most two, watchers—whereas a Wikipedia article is likely to have many—and in a large number of cases those contributors are not even active any more.

When this happens we will likely never discover the problem. Whereas if the image reference gets broken (points at a non-existent file) it will show up in Category:Pages with missing files where we can track them and fix them.

I'd therefore like to solicit some input on whether it makes sense to run CommonsDelinker here, or if we should disable it.

Pinging Billinghurst who may have some relevant input; and grin, the bot operator, as a courtesy / for their perspective. Xover (talk) 06:46, 27 April 2023 (UTC)Reply[reply]

By the by, I have fixed the CDL issue for Mold Web Course which is the referred diff, and the other issues with the reproduction. — billinghurst sDrewth 08:21, 27 April 2023 (UTC)Reply[reply]
@Xover: We have an AF that checks and reports every edit by CDL. I then check every edit it makes, and follow-up at Commons as required; and I have a wide range of responses and actions that I undertake. The bot is not acting as a bot with rights, and it is behaving exactly as we want it to do and the visibility that we require. The fact is that we want to see what Commons is doing to our files, and the CDL is the best way to see, as there is no ready way to see files nominated for deletion, so we are always acting reactively. Can I say that it is far worse where Commons deletes the pdf/djvu file as CDL cannot remove the Index: or Page: files, so they happen quietly and more lethally. And I am not the bot operator and have never been, though as part of my Commons admin role, I can occasionally manage deletions there which CDL may do its cleanup. — billinghurst sDrewth 07:20, 27 April 2023 (UTC)Reply[reply]
Sure, but using AF and contribs requires someone to actually watch contribs all the time; and when you get busy IRL they go unnoticed again. It becomes a "bus factor" issue: "how many people have to get hit by a bus before the system breaks down?" Right now I suspect that factor is one. Keeping the breakage visible in the maintenance categories allows a much wider set of contributors to monitor for such issues and work the backlog.
(Which reminds me, I should get off my behind and make a list of tracking cats we should try to monitor more actively to fix problems as they occur) Xover (talk) 07:33, 27 April 2023 (UTC)Reply[reply]
@Xover: Sure, though missing files is just going to show that they are not held locally, not that they were uploaded and deleted at Commons. There is also no real time ability to note their deletion, and relies on the whim of someone needing to go to the category, then having to know what to do from first principles, then going to Commons, and knowing how to get files undeleted, and the long argument from a position of weakness with an admin. My plan works way better, though yes it does rely on me, so it fails the bus on person test, whereas yours passes the bus test but you have no buses in the neighbourhood on a quiet rural lane, no map, no keys and a long walk to the station.

Can I say that the Special:RecentChanges are public, Special:Contributions/CommonsDelinker is public, that AF is public, and so are the logs for that filter, so any person looking can see it, so there is actually more potential visibility, even if they don't have the fixing tools.— billinghurst sDrewth 08:00, 27 April 2023 (UTC)Reply[reply]

Also noting that the CDL edits from the abusefilter all have a tag on them to enable review, so you can see them through the tags filter in Special:RC. So what I have done is added some text to identify that it needs review. What we should be doing is creating target pages within Wikisource:Maintenance (or alternatively Wikisource:Maintenance of the Month) that we point to, and describe the process for resolution. — billinghurst sDrewth 08:09, 27 April 2023 (UTC)Reply[reply]
Oh, yes, indeed: irrespective of this particular issue I whole-heartedly agree we should be better at facilitating collaborative approaches to maintenance such as through Wikisource:Maintenance. Teaching lots of people to fish is bound to be more important than any kind of technical nitty-gritty of how we track the fishing spots. Xover (talk) 08:16, 27 April 2023 (UTC)Reply[reply]
FYI, I did notice that Contemporary English woodcuts lost a large number of woodcuts when they were deleted at Commons, e.g. [1]. It would be nice if they could be moved and hosted on wikisource instead. MarkLSteadman (talk) 12:44, 28 April 2023 (UTC)Reply[reply]
Excellent example: here CommonsDelinker removed the image and nobody noticed, so it's been sitting broken for over a year. If we didn't use CommonsDelinker the image would have sat as a redlink (broken image reference) and would have shown up in Category:Pages with missing files. Even if nobody was watching that category at the time, the fact it was broken would have stayed visible even after the edit itself scrolls off of Special:RecentChanges and Special:AbuseLog. Right now we have no idea how many images have been removed by CommonsDelinker that nobody noticed. Xover (talk) 13:23, 28 April 2023 (UTC)Reply[reply]
Note the automated renaming of CommonsDelinker when the commons file changes is very useful. MarkLSteadman (talk) 14:50, 28 April 2023 (UTC)Reply[reply]
@Xover: We do know what is happening with CDL edits ... Special:Contributions/CommonsDelinker everything is clear. You can also see what has been reverted and what has not by the Tags => Tags: Reverted CommonsDelinker. I will recover and migrate those identified missing images in the next couple of days. Moving them over to here is now such a painful task unfortunately. — billinghurst sDrewth 12:00, 30 April 2023 (UTC)Reply[reply]
Some page space images that have been removed, Page:Territory in Bird Life by Henry Eliot Howard (London, John Murray edition).djvu/97, Page:Territory in Bird Life by Henry Eliot Howard (London, John Murray edition).djvu/159, Page:The Life and Letters of Raja Rammohun Roy.djvu/9, Page:The Story of Doctor Dolittle.djvu/1, Page:British Reptiles, Amphibians, and Fresh-water Fishes.djvu/9, Page:Beatingtheinvader.djvu/1. Page:Every Woman's Encyclopedia Volume 1.djvu/510. MarkLSteadman (talk) 23:28, 30 April 2023 (UTC)Reply[reply]
Right. It's not that we can't find out what it's done, it's that the logs-based approach is best suited to 1) realtime monitoring of what's happening now and 2) research into what happened historically. If nobody's watching when the edits happen they'll scroll off the horizon and someone needs to go actively looking for them, unlike a tracking category that we can show in relevant places as Category:Pages with missing files (27). For tracking something that should not get lost but can not necessarily be fixed immediately, a tracking category where items stay permanently until the problem is fixed is less fragile and easier to handle. We can't base stuff like this on a single contributor never getting sick (or fed up, or busy IRL, or ...), and we have enough problems getting patrolling coverage of the critical stuff at Special:RecentChanges without adding more such real-time tasks. I think we should seriously consider whether this particular CommonsDelinker task is the best way to do it, or whether letting image references break and automatically end up in the maintenance category would not be a better approach. Xover (talk) 09:48, 1 May 2023 (UTC)Reply[reply]
It is really not helped when it is one of our own community members nominates these for deletion prior to them being repatriated
Sidenote: you can detect what the bot changes by following its contributions. Maybe it's more fiddly than looking at broken images list: pick which is best for you. -- grin 18:19, 3 May 2023 (UTC)Reply[reply]

I will accept whatever you people decide (preferably with voting or by other wide-coverage method); the bot can skip enwikisource, and you'll have broken image links which can be handled individually. --grin 18:16, 3 May 2023 (UTC)Reply[reply]

  •   Keep Yes we want CDL here. No, we don't want to have rename files that are moved. We have been pretty well managing the deletions that have occurred. Come up with a better solution prior to making more work and more holes. We have a solution that is working, yours is not a solution, just a bigger evident problem. — billinghurst sDrewth 21:31, 3 May 2023 (UTC)Reply[reply]
    @Billinghurst: To be clear, when I'm suggesting we stop using CommonsDelinker for deleted images that does not necessarily mean dropping it for moved files. I'm sure grin could disable just the one task for enWS but leave the other one running. Xover (talk) 05:30, 5 May 2023 (UTC)Reply[reply]
    Still not in favour of your proposal. — billinghurst sDrewth 10:01, 5 May 2023 (UTC)Reply[reply]
    Fair enough. You're the one putting in the work here, so your voice has weight on this. But let's just say it's not like you're likely to run out of tasks any time soon, and making it practical for others to take some of that load would have been a nice bonus. Xover (talk) 10:13, 5 May 2023 (UTC)Reply[reply]

A maintenance threadEdit

Hi to all. As mentioned in the previous section there has been questions about where are we with maintenance, and with its coordination. We have

which were our attempts to get us on track years ago, and probably have fallen away since.

I know that I have my own maintenance schedule, scripts, and the like that I potter through--on what seems like one monotonous journey of (re)painting a bridge--to the point that my actual transcription editing is few and far between these days. I also know that doing new works and new information provision is always more fun than maintenance.


  • Are there people interested in undertaking maintenance tasks?
  • What maintenance tasks do you wish to see?
  • What maintenance tasks would you have an interest in undertaking?
  • What information and guidance would you expect to see for your tasks?
  • Do you sort of want to own your task? Or do you just wish to lucky dip tasks that are there?
  • New one-offs? Or a repeated, regular.
  • Does anyone have an interest in coordinating/overseeing?
  • Have good vision on how maintenance has been done elsewhere that would give us a good starting model?

(and your own questions, opinions, knowledge hereafter) — billinghurst sDrewth 03:18, 28 April 2023 (UTC)Reply[reply]

Mulling this one over still (thanks for taking the initiative!), but… Regarding Maintenance of the Month: it'd be a nice thing to revive, but it's also one of those things that would require someone to run the process (light, or more extensive, but someone would need to run it either way), so I think maybe reviving that is not where we first start. Keeping it in mind as a tool in our toolbox for organising efforts would be smart though, especially if someone volunteers to be the one to manage the process. Xover (talk) 10:19, 5 May 2023 (UTC)Reply[reply]

1928 PD (2023) scan projectEdit

Looking back at the past few years of Public Domain Day, I think it would be a good idea to have a few scans in store for the change of year. This is especially so because many of the books just entering the public domain won’t have scans on the major scan-hosters because of the up-to-then existing copyright restrictions. I have access through my library to a lot of the books on WorldCat, and would be willing to scan them in before the new year. (It’s a little early now, I know, but I thought it would be better to ask now rather than to push it off until quite late.) I’m thinking of items listed at Wikisource:Requested texts/1928, but I would be especially interested in works where editors promise to proofread the works once they are ready. I would create scans in the coming months, and hold on to them until January 1, 2024, when I would upload them to Wikimedia Commons (or here, if there are other copyrights), so work can begin right away. TE(æ)A,ea. (talk) 02:10, 29 April 2023 (UTC)Reply[reply]

I am interested by Satyagraha in South Africa by M. K. Gandhi, and The Mystery of the Blue Train by Agatha Christie. And there are quite a number of short stories by Christie which we don't have yet, but are already in the PD in USA. Yann (talk) 11:35, 29 April 2023 (UTC)Reply[reply]
  • Yann: Satyagraha in South Africa (OCLC 7965175) and The mystery of the blue train (OCLC 315963986, reserve OCLC 1075695763) have been called upon. I’ll also provide updates as the process of getting and scanning the books continues. TE(æ)A,ea. (talk) 12:05, 29 April 2023 (UTC)Reply[reply]
    • Yann: Satyagraha in South Africa has just come in (and The mystery of the blue train should be here soon). It is a revised second edition (from the 1950s), which itself carries no copyright notice (apart from the original, 1928 notice). Is that acceptable? TE(æ)A,ea. (talk) 20:42, 10 May 2023 (UTC)Reply[reply]
  • @TE(æ)A,ea.: It is in the public domain in India. I guest the copyright status in USA depends on the amount of changes from the first edition. Difficult to say without having both. Yann (talk) 20:49, 10 May 2023 (UTC)Reply[reply]
  • Yann: The copyright page states: “First Edition, 1928/Revised Second Edition, December 1950/Third Impression, August 1961, Fourteenth Thousand/Rupees FourThe Navajivan Trust, 1928”. On page viii: “TRANSLATOR’S NOTE/(Second edition)/This is a reprint of the first edition except for some verbal alterations suggested by my friend Shri Verrier Elwin who was good enough to go through the translation at my request.” Those changes may be marked in the six corrigenda. Is that good enough? TE(æ)A,ea. (talk) 15:33, 11 May 2023 (UTC)Reply[reply]
I would like to work on The Masqueraders by Georgette Heyer. (And hopefully by next year I'll be done with her previous Georgian novels!) —CalendulaAsteraceae (talkcontribs) 21:55, 30 April 2023 (UTC)Reply[reply]
@TE(æ)A,ea.: If possible, I would be very interested in books "A forest story" by Josef Kožíšek (here) and "The Magic Flutes" by the same author (here and here). They were published in 1929 but their copyright was not renewed. What do you think, would it be possible to get them? --

License templates on filesEdit

Hello everyone! According to wmf:Resolution:Licensing_policy all files on wiki projects need a valid license template. According to Special:Statistics there are 18,693 right now.

So I added {{free media}} to a number of license templates to make the files show up in Category:All free media. But it seems that only a few files will show up there (perhaps 1-2k).

So I wondered if there are really thousands of 'illegal' files on wikisource?

Then I noticed Category:Raw page scans for missing images with 16,215 files. I checked a number of random files and they did not have a license. But they are an extract from files on Commons where there should be a license.

My question is what to do. Should the files be moved to Commons or are they needed locally? Is there an easy way to add the missing license? --MGA73 (talk) 18:29, 29 April 2023 (UTC)Reply[reply]

  • MGA73: I don’t see that requirement here, but I’m not too knowledgeable of the exact rules. Those files have proper licenses through the files from which they are respectively derived. The files in that category were created as part of a process which now no longer creates local images. The files in that category have been gradually replaced with ones uploaded either here or on Wikimedia Commons, as appropriate. After that occurs, the person who created the new image should nominate the one here for deletion; however, that has not happened in a number of cases. I have gone through the category in the past—it used to have over 20,000 files—and will probably go through it again this summer, when I have more time. Thank you for reminding me of the category. As for your last paragraph, besides the review, no work in particular is needed. TE(æ)A,ea. (talk) 18:50, 29 April 2023 (UTC)Reply[reply]
  • The {{free media}} categorization implies that the work is out of copyright worldwide, and for some files and for some licensing templates that is not true. I have reverted your addition of the category to several templates, and other folks here better acquainted with the copyright policies may revert more. Some files housed here in Wikisource should not be moved to Commons because they do not satisfy the requirements for hosting on Commons. The criteria for Commons are more stringent than those applied on Wikisource. In future, if you have a question, it is better to ask before changing a whole array of widely-used templates. --EncycloPetey (talk) 19:11, 29 April 2023 (UTC)Reply[reply]
    Indeed. Please revert these changes. Xover (talk) 19:20, 29 April 2023 (UTC)Reply[reply]
@MGA73: Long story short: these are a non-issue in terms of licensing, and they're a manual backlog we'll finish eventually.
The longer version is that these are high-res versions of individual pages that contain illustrations and plates from scanned books. They were bot-uploaded to serve as placeholders until someone got around to properly extract the plate (it's used automatically by {{raw image}}/{{missing image}}). For various reasons we don't use these any more (new ones are not being uploaded), and most of them will show up on Special:UnusedFiles (well, eventually; it only caches 5k at a pop), but a previous discussion determined that we have to evaluate each of them individually (because reasons, I guess). So I chip away at them whenever I have a spare moment, and TE(æ)A,ea. has been checking larger batches, so we're now down from ~25k to ~15k.
We host everything on Commons unless it is 1) out of scope for Commons (i.e. local screenshots and stuff) or 2) incompatible with Commons' licensing policy. enWS licensing policy allows works that are PD only in the US, while Commons requires US and the country of origin, so we have a significant number of local files for that reason. The majority have proper license tags, but trying to identify and fix the ones that don't is fairly hopeless until we clear the above backlog. I'm pretty sure we don't have a significant problem with unlicensed local file uploads; it's just an issue of the usual housekeeping. Xover (talk) 19:17, 29 April 2023 (UTC)Reply[reply]
After working through a large number of files we are now down to ~4k in Special:UnusedFiles so the 5k cache limit should no longer be a concern. MarkLSteadman (talk) 22:30, 12 May 2023 (UTC)Reply[reply]
@MarkLSteadman: Wow! Between you and TE(æ)A,ea. an intractably large job has now become something we can expect to finish in some reasonable time frame. Thank you!
The bad news is that we still have 15k+ files sitting in Category:Raw page scans for missing images that are of dubious value, but still require manual intervention. Xover (talk) 10:09, 13 May 2023 (UTC)Reply[reply]
  • TE(æ)A,ea. According to Wikisource:What is Wikisource? then "Wikisource – The Free Library – is a Wikimedia Foundation project". As a Wikimedia Foundation project Wikisource must follow the rules set out out by WMF and that include the wmf:Resolution:Licensing_policy. So it is a requirement that the files have a license. Files without a license should be deleted. On Wikipedia it happens after 7 days. Same should happen here.
It takes a long time to check manually so that is why it was made a global requirement that files have a machine readable license and why the two categories above was created.
User:EncycloPetey I know that not all files can be moved to Commons. But because you reverted my edits I can't easily make a list of files without a license. Wikipedia have w:Template:Free in US media but that does not exist here yet ({{Free in US media}}). Perhaps create that?
You can certainly ask for help identifying problematic files. Someone may be able to assist you with accumulating data. As I say, starting by changing Templates to cause a large number of pages to be re-rendered across the whole of Wikisource is not the best approach. Your edits will cause thousands of pages to be changed, and not just the files you're interested in, but every files on this site with one of those licenses. And as you aren't familiar with the licensing requirements here on Wikisource, how do you propose to help? --EncycloPetey (talk) 19:49, 29 April 2023 (UTC)Reply[reply]
EncycloPetey if you can fix the problems without my assistance that is fine by me. But since there are files like File:CompassAlmostAsleep.jpg that was uploaded in 2008 without a license it seems that there are something in your procedure that does not catch all files without a license. If you fix the files in Category:Files with no machine-readable license that should hopefully catch most of the files without a license. --MGA73 (talk) 20:00, 29 April 2023 (UTC)Reply[reply]
@MGA73: I'm sure we have a bunch of files that are not properly tagged, and a subset of those that are incompatibly licensed. Just as we have texts sitting in mainspace that lack license tags and a subset of those that are not compatibly licensed. We are aware of the problem areas and the scope of the problems (nowhere near critical), and we are dealing with them deliberately and in the way we have determined is best for our project. Our approach is very different to enWP (or Commons for that matter), so not employing a 7-day hard automatic delete for lack of license tagging is a deliberate choice (we find it user-hostile and counter-productive). So while I would certainly prefer we got through our backlog faster, we're steadily processing this backlog as we are all our other backlogs and with a priority commensurate with its importance. We're always happy to have more help, but then I might suggest starting with scan-backing some of our ~200k naked texts, or fixing the ones in Category:Index - File to fix. There's also any number of projects in Category:Index Not-Proofread where more help would be very welcome. Xover (talk) 20:17, 29 April 2023 (UTC)Reply[reply]
Xover I think that all wikis have more work than they have hands. But copyright should be taken very serious because they represent a legal risk to the project. All files that does not meet the requirements MUST BE DELETED! Check this old diff by Jimbo. He even said that if things did not improve dramatically they would close for local uploads unless for users that have earned the right. So if wikisource host unlicensed text it should be deleted even if you do not like to do so. I think perhaps somewikis keep files for 14 days so I guess that will be okay too. But I do not think that Wikisource can decide to keep known unlicensed text for longer than that. --MGA73 (talk) 20:46, 29 April 2023 (UTC)Reply[reply]
I'm sorry, but do you realise you are now implying that we do not take copyright seriously, screaming in all caps, appealing to Jimbo, and making bald assertions about what we "must" do?
English Wikisource has a licensing policy that is fully in accordance with the terms of service, and we actively and effectively enforce that policy. We delete blatant copyvio immediately, and generally delete incompatibly licensed matter as soon as a proper determination of copyright status has been made.
What we do not do is mechanically delete anything just because some poor newbie hasn't figured out how to add all the right technical gobledygook within some arbitrarily chosen deadline. Or because the file happened to have been uploaded back in the dawn of time before the standards were raised to their current level and we haven't gotten around to fixing it yet. We don't do those things because they are user-hostile and counter-productive, and because our experience shows us that very rarely are these actual copyvios they just lack the right tagging.
So while we appreciate anyone wanting to help, this is an area we can manage just fine by ourselves (unless you actually want to roll up your sleeves and help out with the not-so-glamorous task of manually checking everything on Special:UnusedFiles and tagging anything we can safely delete). User:MGA73/Sandbox looks possibly useful as a manageable subset we can check and maybe improve sooner than we otherwise would have gotten to them, but I also have to note that the first ~50 entries (~10%) there I happen to personally know exactly what they are and why they're there (half are newbie uploads that should be on Commons, the rest are temporary files that will be deleted). I'll be very surprised if there is a significant number of actual licensing problems in there. Xover (talk) 22:31, 29 April 2023 (UTC)Reply[reply]
  • MGA73: I agree entirely with Xover, except as to your usefulness. (A quick look at your sandbox page notices many hundreds of works marked with license templates, for example.) I may mention also that engaging in legal threats is most unbecoming of a user, and likely violates one of those “resolutions” to which you like citing; this is especially in the case of a new, inexperienced user such as yourself, who has also recently undertaken the modification of a number of highly-used templates without community consensus. We here are perfectly able to enforce our own policy, and do not need our policy to be redirected by someone from another site. TE(æ)A,ea. (talk) 23:26, 29 April 2023 (UTC)Reply[reply]
Xover it usually only takes 1 minute to find files without a license provided the license templates have {{free media}} etc. So perhaps create {{Free in US media}} and add that instead? Then I could make a list to see if there are any files without a license besides those in Category:Raw page scans for missing images.
Everyone. Since the files in Category:Raw page scans for missing images will be deleted and they link to Commons where there is a license then I agree that it is not a big problem that they do not have a formal license template. But unless you are sure that all other files have a valid license I suggest that you add one of the templates above to the license templates. --MGA73 (talk) 19:43, 29 April 2023 (UTC)Reply[reply]
  • EncycloPetey Perhaps you would like to explain why {{CC-BY-SA-3.0}} does not qualify for {{free media}}? Commons accept files with CC-BY-SA-3.0 and w:Template:Cc-by-sa-3.0 include {{free media}}. --MGA73 (talk) 20:13, 29 April 2023 (UTC)Reply[reply]
    • MGA73: All of your template changes were reverted. Please discuss such changes with the community before making them, in the future. TE(æ)A,ea. (talk) 20:21, 29 April 2023 (UTC)Reply[reply]
      • TE(æ)A,ea. okay. I do not need those changes if the project can find and fix unlicensed files without my assistance. As written above copyright should be taken very serious. --MGA73 (talk) 20:46, 29 April 2023 (UTC)Reply[reply]
  • I made a list at User:MGA73/Sandbox of files that possible does not have a license. The list may not be as good as it could because I excludes a wide number of categories. Some files are there because the license template does not add the files to a category like "PD in xxx". If you fix the template the files will be removed if I update the list. If you ping me I can make an update. --MGA73 (talk) 21:07, 29 April 2023 (UTC)Reply[reply]
  • @Xover, @TE(æ)A,ea.: I decided to jump to the bottom because I think it makes it easier to read and because it's a follow up on more of the discussions above.
To begin with I would like to clarify that I came here to help out and I do that on many wikis. Usually that works out fine but for some reason things went of track here. I'm sorry for my part of that happening.
You may think I'm a newbie because I do not have many edits here but I have millions of edits related to images on this account and my 2 bots so I do think I know something about copyright.
English is not my native language so that may be a part of the reason things went wrong.
First I read the comment "I don’t see that requirement here, but I’m not too knowledgeable of the exact rules." as a "I'm not sure Wikisource does not have to follow the WMF:Resolution:Licensing_policy". Thats why I tried to explain that Wikisource must follow it because it's a Wikimedia project.
Second I read the comment about not deleting unlicensed text/files and the "nowhere near critical" as "We know we have a lot of copyright issues but we do not think its a big problem". Thats why I linked to the old comment from Jimbo (I did not ping him or call for his help).
However, I do worry a bit when I read "our own policy, and do not need our policy to be redirected by someone from another site". If it is to be read as "We do not care what WMF have written in their policy" then I think we have a major problem. If its just frustration that I point to a problem areas in a non diplomatic way then I can just repeat it was not intented to be that way.
As for the value of the list on User:MGA73/Sandbox I would like to repeat that "Some files are there because the license template does not add the files to a category like "PD in xxx"." So I know some files have a license and as I wrote it can be fixed if you edit the template and make it add the files to a category like all the other license templates. (Perhaps on {{Legislation-SGGov}} add {{PD-EdictGov}}).
Xover, I do not mind helping out and I do that a lot on many Wikis. One of the latest tasks was jawiki where I helped fix 80k files. But when I'm told that "We here are perfectly able to enforce our own policy..." that does not really motivate me to help out. But I have made the list so its now easy to go through the files and enforce the policy as stated.
Last thing. If not allready done you might want to try to make a similar list of unlicensed text in main space etc. --MGA73 (talk) 10:22, 30 April 2023 (UTC)Reply[reply]
@MGA73: No worries, and I'm sure we share the blame for not keeping this from going off into the weeds.
I'm trying here to concisely address your concerns: 1) we are very much aware of the scope for local autonomy on licensing issues that the WMF affords us, and are operating within those boundaries; 2) our licensing policy, community practices, and enforcement of these does not differ markedly from other projects, and our enforcement is effective; 3) we are confident we do not have any significant problem with copyvio lurking in our backlogs, because we are aware of the issue and have spent quite a bit of effort on cleaning up things that are not properly tagged etc. and the proportion we find that have actual licensing problems (i.e. are copyvios) is miniscule; 4) because we are confident we do not have any "lurking unnoticed copyvio" problem worth speaking of, we do not want to summarily delete things that have deficient tagging as the vast majority of what we deleted would be perfectly valid, in-scope, and properly licensed content that just failed to observe all formalities possibly a decade before those formalities were made explicit.
So when we do not consider lack of machine-readable licenses on files as a critical problem it is not because we do not care about appropriate licensing, but because we have ample reason to believe that it is purely a technicality (with possibly a trivial amount of actually problematic files). We do work on cleaning it up, but it competes for priority with all our other maintenance backlogs of the same type. And most of the low-hanging fruit has been picked, so, for example, what lacks tagging in mainspace (which we do track through various means) consists of a lot of stuff that needs detailed research to identify actual copyright status. I've personally spent countless hours on such issues and the vast majority I end up tagging with a compatible license rather than bring to WS:CV. Of those brought to WS:CV a large proportion are saved through the diligent research of other contributors. The content I've found here that is blatant enough that it can be speedied is a trivial amount, and even these require at least some research to verify.
So while we do aim to get to an end state where every file and every text we host has an appropriate license tag, we are going about that with slow deliberation (quality over quantity). We also don't feel it likely that we have any significant blind spots here, so while we always welcome more hands to help we don't feel we need any external help pointing up stuff that just lacks tagging or similar (we already know and are tracking it).
If you want to help the most effective thing you could do is start going through Special:UnusedFiles, checking that the raw page scan has been replaced with a properly extracted image (cropped, color-corrected, etc.), and then tag it for {{sdelete|A1 transwikied}}. That's our biggest backlog that is generating noise for other cleanup efforts (including getting proper license tagging in place), and it's going slow because it requires manual (human) verification and lots of clicking around on slow-loading pages (MediaWiki's multi-page file support is pretty crappy). Xover (talk) 11:05, 30 April 2023 (UTC)Reply[reply]
@Xover Thanks a lot for your reply.
The reason I wanted to make a list is because in many cases a wiki do not have an easy way to find the files without a license (if they did they would probably have fixed it long ago). If added to a list or put in a category it is very easy to work on the files.
As for the 16k files my thought was that if the files were needed I could perhaps move them all to Commons with FileImporter and fix the missing license etc. with my bot.
But for now I plan to work on other wikis. --MGA73 (talk) 11:36, 30 April 2023 (UTC)Reply[reply]
I updated the list. As written above the list would have been shorter if {{Legislation-SGGov}} was fixed. --MGA73 (talk) 19:38, 12 May 2023 (UTC)Reply[reply]
@MGA73: What exactly is the problem with {{Legislation-SGGov}} as you see it? Xover (talk) 20:57, 12 May 2023 (UTC)Reply[reply]
It doesn't add a cateory that is a subcategory of Category:Works by license like the other license / government templates, e.g. it should categorize into a subcat of Category:PD-EdictGov. MarkLSteadman (talk) 22:25, 12 May 2023 (UTC)Reply[reply]
Xover Yes as MarkLSteadman says there is no PD-category. Perhaps on {{Legislation-SGGov}} add {{PD-EdictGov}}. --MGA73 (talk) 06:09, 13 May 2023 (UTC)Reply[reply]
  Done. I see we have quite a bit of cleanup to do in this area, not least is making the license cats hidden, consistent naming, and so forth. But for now it should get the SGGov works out of MGA73's list. Xover (talk) 08:21, 13 May 2023 (UTC)Reply[reply]
Updated list again. It removed one third of the files. And yes cleaning up license templates is a nice little project. But I have seen much worse. Some wikis do not have Category:License templates so there it is hard to find the license templates. --MGA73 (talk) 18:21, 13 May 2023 (UTC)Reply[reply]
Most of the remainder are tagged {{Legislation-HKGov}} or {{Legislation-CAGov}} which appear in the list for the same reason as {{Legislation-SGGov}} did. MarkLSteadman (talk) 20:10, 13 May 2023 (UTC)Reply[reply]
I've fixed {{Legislation-HKGov}}. {{Legislation-CAGov}} was adding the right category (Category:Legislation-CAGov) but the category was not set up as a subcategory of Category:Works by license. I've standardised the category page now so these should all show up in the right place. Xover (talk) 11:15, 15 May 2023 (UTC)Reply[reply]

OED1, againEdit

There's an old 2013 discussion about the possibility of taking on the first, 1928 edition of the Oxford English Dictionary. At least one thing has changed since then: the 96th year after the publication of OED1, 2024, is now only months away. (But unfortunately the first supplement didn't come out until 1933.) Come to think of it, another thing which seems to have changed in the past decade or so is that access to digital editions of the OED has become even more expensive and limited, on the whole. A high-quality free transcription of OED1 would be a good 80% solution to that problem (at least once the first supplement can be added to it). Hopefully it would impose some welcome competitive pressure on proprietary dictionary-makers, to boot. Are people minded to begin work on an OED1 transcription on English Wikisource, or indeed anywhere else? (I couldn't find any discussion of the issue on English Wiktionary, but maybe I simply missed it.) It would absolutely be no small project, but at least in terms of raw size it would be no bigger than some of the other projects which Wikisource has completed over the past decade. RW Dutton (talk) 04:11, 1 May 2023 (UTC)Reply[reply]

I don't have the capacity to actually participate in such a project, though I would love to see it done. But I'd be happy to help with scan-wrangling and setup, not least in the hopes of getting it to use modern standards instead of the ad hoc approach used by our early massive projects. If we get together the interest to tackle it we should also liaise with Wiktionary to figure out the best way for them to reuse our efforts; either by letting them systematically link to it here, or by enabling them to import it afterwards in a useful way. We should also keep Wikipedia in mind, as they often need to cite dictionary and etymology (early attestations) for words for which OED1 would be very convenient. I don't think there's any useful entity to collaborate directly with there though. Xover (talk) 09:26, 1 May 2023 (UTC)Reply[reply]
Most of the OED1 has been PD since before Wikisource. It's valuable, but in sheer size I'm pretty sure it dwarfs the Encyclopedia Britannica project, and unlike the plain text of the encyclopedia, it's got an idiosyncratic phonetic system and complex formatting. I'm all for it, but it's a pretty overwhelming project larger than any done before on Wikisource.--Prosfilaes (talk) 09:45, 1 May 2023 (UTC)Reply[reply]
Why not having the w:Concise Oxford English Dictionary, first published in 1911? Yann (talk) 12:04, 1 May 2023 (UTC)Reply[reply]
It's not an either/or but Wiktionary wants the obscure words and the extensive citations. At this point in time, another PD standard English dictionary transcription doesn't feel really worth it to me, unless it's the big one.--Prosfilaes (talk) 13:15, 1 May 2023 (UTC)Reply[reply]
Just my two cents: I have transcribed a couple of (small, and much much smaller than OED1) dictionaries at esWS. I HIGHLY suggest that you use the ; and : syntax that is meant for definition lists, and do the formatting exclusively by IndexStyles. You can check on my biggest proud there, this etymology dictionary. I also used a template for the abbreviations, to make the code cleaner. EDIT: Also the worflow is massively improved if you make some post-OCR processing script that does the slow and hard part of adding the syntax, standardizing it and add the abbreviation templates, among other things. Ignacio Rodríguez (talk) 23:41, 5 May 2023 (UTC)Reply[reply]

The header links (previous / next); subsection links, and other navigation links throughout this work are a complete mess. I found that the "main page" for The Mysterious Island pointed (via redirect) to part of Twenty Thousand Leagues Under the Sea. In fact there is no main page for The Mysterious Island. The "title" link, for another work, points to an internal part of the work, and not to the "title".

I may be able to tackle the problem this weekend, but it will likely take several hours, and I don't know whether I will have the time then. Given the importance and probable high traffic of this work, someone might want to go through and correct all the sections, pages, and links before then.

Incidentally, I only discovered the problem after someone noticed that the end of The Mysterious Island wasn't transcluded at all, and added the final pages. So there may be other parts where sections of the works have not actually been transcluded. The work needs a thorough going-over to fix all mainspace pages in all respects. --EncycloPetey (talk) 18:15, 3 May 2023 (UTC)Reply[reply]

To be clear, this does not appear to be the fault of any one person's actions. I see at least half-a-dozen experienced editors with edits made among the pages. It is more likely the result of lots of individual small-scale changes without anyone checking that the large-scale transclusion makes sense. --EncycloPetey (talk) 18:47, 3 May 2023 (UTC)Reply[reply]

@EncycloPetey: Very much agree, this issue is something of our own making, and it is only more obvious as we get mature and start addressing these serious major undertakings. Series works, and these posthumous collections of authors are problematic for our display, and we have not handled any of it brilliantly or uniformly. I separately noted this to a user for Loeb Classical Library and its subpages. I think that it is time that we review our approach to these collated works. I know that I harp on about the use of the Portal: namespace, though to me for the work identified the overarching "Works of Jules Verne" maybe better as a portal, and each of the individual components could be set up as the works themselves.

I note that we would only take this approach to the volumised works where they are these latter collations that are essentially series of separate and combined volumes. — billinghurst sDrewth 01:47, 4 May 2023 (UTC)Reply[reply]

I agree about this being confusing, see the recent discussion about The Complete Works of Lyof N. Tolstoï and The Novels and Other Works of Lyof N. Tolstoï. Among the issues are:
1. Print / Bounded and hence typically index page vs. thematic divisions, especially where the collected work is itself made up of subcollections, and how that aligns with wikidata and previous / next
2. Creation and display of large number of works e.g. poems / letters
3. Handling portions still under copyright, leading to links to scans, copyright renewal tags and transcriptions in Main, note that these are often not tagged with a license violating our copyright policy
4. Headers aren't really designed to handle Collection --> Volume --> Section hierarchy where we might want to indicate per volume Author / editor as well as section / contributor
5. Looseness around licensing / categorization as they are now subpages
MarkLSteadman (talk) 02:32, 4 May 2023 (UTC)Reply[reply]
With respect to moving to Portal, it would be good to clarify our policy around metadata / linking / authority control, / redirects and how to classify them in the portal hierarchy. E.g. Works of Jules Verne has LCCN 14001405 and LOC Classification PQ2469 so presumably this would be a child and linked from Portal:French literature? MarkLSteadman (talk) 02:55, 4 May 2023 (UTC)Reply[reply]
@EncycloPetey While I appreciate your comment "this is not the fault of any one person's actions", some of these large works aren't initially set up well, and, as you say, can take a fair amount of time to fix. Personally, I found the page Works of Jules Verne so messy, that I never realised that Works of Jules Verne/Volume 5 was where the work actually started, when it was originally entered into the MC as proofread, but requiring the transclusion be split into chapters (I think the entire work was originally transcluded onto two pages). As a general rule, if a work seems "serious", with multiple experienced editors having made changes, then I am quite hesitant (and perhaps for other non-admin Wikisource users also) to clean anything up, besides the express splitting request in the MC, lest I receive an angry rant from someone about how I shouldn't have made what I thought were "improvements"... Perhaps removing the contents page from Works of Jules Verne, so people actually click on Works of Jules Verne/Volume 5 would be a start, or moving the auxTOC's to within the work.
I also could be mistaken, but I didn't think there was a "main page" for the Mysterious Island in Volume 5, it was just a heading, unless it was implied I should have split page Page:Works of Jules Verne - Parke - Vol 6.djvu/23 into sections, and transcluded "s1" as just the heading, onto its own page. Is there a convention for this? Usually, e.g. with the HG Wells series that is running, there are proper title and contents pages at the start of each work, rather than requiring an auxTOC like for Jules Verne.
Also, the Mysterious Island was in two volumes, and when splitting the translusion, volume 6 hadn't been proofread yet (unless this wasn't what you were talking about).
Regards, TeysaKarlov (talk) 21:50, 5 May 2023 (UTC)Reply[reply]
I was not talking about Volume 6; only the parts that have been proofread. --EncycloPetey (talk) 21:58, 5 May 2023 (UTC)Reply[reply]

  Comment Compendium works that are split into volumes do not need to be (should not be?!?) be reproduced here as subpages of the respective volumes. It just makes things harder than they need to be, to us the volume index page can still be created, without making any works subsidiary to a volume hierarchical structure. — billinghurst sDrewth 08:08, 6 May 2023 (UTC)Reply[reply]

I think it will be hard to make it a should not as the inclination to mirror the index pages is so strong. We see this with magazines were issue number would be perfectly fine being placed in Volume / Issue hierarchies. In this case with volume introductions people naturally incline to Volume 2/Introduction. MarkLSteadman (talk) 04:32, 9 May 2023 (UTC)Reply[reply]
It is always tricky This becomes very ugly and very silly quite quickly for no apparent win ...
1. Work of Jules Verne/Volume N/Introduction
2. Work of Jules Verne/Volume N/Title 1/Chapter NN
3. Work of Jules Verne/Volume N (one could argue that ToC and introduction can all sit on the /Volume N page)
4. Work of Jules Verne/Introduction to Volume N
5. Work of Jules Verne/Title 1/Chapter NN
6. Title 1 (Work of Jules Verne)/Chapter NN
What are we truly trying to achieve? with the hierarchical mess of volumes which are essentially worthless. Yes, we would need to give good guidance, however, with compendium works there is always time to assist and process. Whereas the volumes of serials are important, that is a whole publishing history, usually of one off editions. We need to separate people's brains from thinking that they are the same just because they both utilise the word volume. — billinghurst sDrewth 04:56, 9 May 2023 (UTC)Reply[reply]
Any chance we could start transforming some of your thoughts into prescriptive form, as an early draft of specific guidance on this? Just from the above I see you and Mark have managed to think this issue through to a much grater degree than I have, so speaking just for myself it would be a great help in trying to think about it (easier to have something concrete to agree or disagree with). Nothing fancy, just your thoughts phrased as if they were guidelines: this sort of work should be treated thus, but these other kinds should be done up in this other way.
The only things in this area I have a somewhat formed opinion on are The Satyricon of Petronius Arbiter that was split into vol. 1 and vol. 2 for printing but is obviously a single work (the page numbers even continue across the two volumes, and which should be transcluded transparently as a single work; and things like Johnson's The Plays of William Shakespeare (1765)—that is a cohesive set of volumes with critical commentary on Shakespeare's plays, unlike Portal:The Yale Shakespeare that are individual editions merely published in a series with volume numbers (well, retconned, but...)—that also should live under a single top-level page somehow (not necessarily with a "…/Volume n/…" path component, but possibly). Xover (talk) 06:26, 9 May 2023 (UTC)Reply[reply]
My initial thoughts are we have a four-fold split:
1. Volumes in a publisher series, i.e. those works which are published under different titles, generally by different authors at different times, but having the same publisher / editor / presentation. For these works I think some sort of non-mainspaced based organization makes sense but thinking the exact relationship between listing on the publisher's vs. creating a portal page with the appropriate categories / authority control (is this allowed under out policy since we then have "author-like" authorities and "work-like" authorities / licensing (should the whole collection be tagged with an appropriate license?) is unclear to me as well as the cross-namespace redirecting.
2. Volumes in a serial publication, i.e. those works published under the same title but on a regular cadence. Currently these are generally held in main which makes sense (they have a single name and "creator") but here the main issues are all-basically related to handling the varying publication date / editors properly (e.g. indicating the varying editors and dates in the headers, license tagging). It certainly make sense to go from Volume N to Volume N+1 from within the publication and often the volumes have their own TOCs. So generally Publication Name (listing just the volumes / dates / editors) --> Volumes (listing per volume TOC or sub issues) makes sense with Volumes doing previous / next in order. These were published and sold separately at different time.
3. Volumes in a collective work. i.e. those works which are now more uniform in content than case 1 with the scope being generally pre-defined up front, having indices / general table of contents, etc. These are the works between case 1 and case 4. At the most case 4-like you have works first published by authors such as Essays: First Series and Essays: Second Series, Tales of My Landlord (1st Series)/Volume 1, on the other side we have Sacred Books of the East, The Harvard Classics, etc. where now we have multiple translators or authors, the volumes being dual titled on the title page,
4. Volumes in a single work, i.e. those that only have a single title, continuous pagination / chapter numbering etc. In main space and generally not a problem.
My suggestions for a way forward would be:
a. Getting guidance for 1. and 2. started as proposals for revising the appropriate sections of the documentation to be able those in a more prescriptive way (e.g. defining how serial publications should be tagged for licensing, where such collections should live in Portal)
b. Writing up more suggestive guidance for case 3 outlining the various options as laid out in the discussion so far to be added as well
c. For this specific work, maybe right up some proposals on the talk page for the work and then vote?
MarkLSteadman (talk) 16:52, 13 May 2023 (UTC)Reply[reply]
FWIW options 1 and 2 above seem the most logical. We can always add redirects. I am glad this issue comes up, as I have had a question about it a few weeks back, but I didn't get an answer. Yann (talk) 08:39, 9 May 2023 (UTC)Reply[reply]
Politely disagree. What does a volume level do here? I see no value in such. All published references are not going to be to the compendium, but to the work. All it does is add complexity to the title, for no benefit. — billinghurst sDrewth 11:48, 15 May 2023 (UTC)Reply[reply]

Translation: namespace and WikidataEdit

We seem to have an issue that the pages/works in our Translation: namespace are having their own wikidata items created as unique versions. Can I emphasise to our community that the Translation: ns is set up as a commodity to allow for translations, as there was that desire by the community, and there was no other real wiki where it could happen. The pages in Translation: ns are not true publications as they miss the requirements and notability for publication. They are dynamic documents with no date or place of publication, no translation authors, no copyright, no authority in translation.

There could be the argument made that these pages should never be listed at Wikidata as they fail the notability. I think that such approach is a little harsh, so if we are to link them at wikidata I have been doing so on the version from which the translation has been taken. So for example, a Russian newspaper article that is transcribed at ruWS has the translation here listed against that version, not its own. One can essentially equate this with how Wikipedia articles are all linked to the same item at Wikidata. — billinghurst sDrewth 01:38, 4 May 2023 (UTC)Reply[reply]

Translation namespace has many works that are not at other wikisources and thus out of step with our guidance at WS:Translationsbillinghurst sDrewth 10:37, 5 May 2023 (UTC)Reply[reply]

Library back up projectEdit

Hi, Some people started this on Commons: c:Commons:Library back up project. The idea is to upload books to preserve them, as Wikimedia is probably going to last much longer than other book hosting websites. Books that are still in copyright are uploaded, deleted just after, and added in the relevant "Undelete" categories. Just FYI as there is obviously a connection to Wikisource. First books added were in Chinese and Japanese languages. I added some more in English, French, and Indian languages. Yann (talk) 20:36, 5 May 2023 (UTC)Reply[reply]

You were aware of the the efforts Fae made in terms of mirroring IA hosted works on Commons?, That might be something to continue as well. ShakespeareFan00 (talk) 21:39, 5 May 2023 (UTC)Reply[reply]
Main problem with Fae's uploads is that they're mostly pdf despite there being a djvu option available and often the poorer quality scans were selected by the bot, which means that the OCR is dodgy. Beeswaxcandle (talk) 03:45, 6 May 2023 (UTC)Reply[reply]
Yes, all files I uploaded from IA were not uploaded by Fae. Yann (talk) 13:06, 6 May 2023 (UTC)Reply[reply]
Keep going on English Language works from IA. Areas of interest I have very long-term would be ancient English law reports, but that's a very very long term project, after the 18 months or so it will take to clean up lint-errors. ShakespeareFan00 (talk) 13:11, 6 May 2023 (UTC)Reply[reply]
Re: Fae, also the widespread uploading of non-US origin works that are still in copyrighted in their source country breaking Commons's copyright policy. MarkLSteadman (talk) 13:14, 6 May 2023 (UTC)Reply[reply]
That was something Fae was working on resolving, when certain ill-considered comments at Commons caused their departure. ShakespeareFan00 (talk) 13:17, 6 May 2023 (UTC)Reply[reply]
Just mentioning that determining copyright status automatically for English language works can be non-trivial. But in general, manually getting high-quality scanned files directly out of the digital collections from various libraries makes sense but this is likely to create a third or fourth copy of the same mediocre google books scans that are already hosted at Google, HathiTrust, and IA. MarkLSteadman (talk) 13:41, 6 May 2023 (UTC)Reply[reply]
There is a difference between uploading files with the wrong license, and uploading files still in copyright for the purpose of long term safekeeping. The later are deleted after being uploaded, and added in the relevant undelete categories. Yann (talk) 15:31, 6 May 2023 (UTC)Reply[reply]
My point was that knowing what undelete category to apply en masse is tricky, unless going with publication date + 140 or something. MarkLSteadman (talk) 02:42, 7 May 2023 (UTC)Reply[reply]
Commons uses the publication + 120 years rule, when the author'(s) death date(s) is/are unknown or uncertain. Seeing the life expectancy at the time, this is quite sensible. Yann (talk) 12:25, 7 May 2023 (UTC)Reply[reply]
@ShakespeareFan00: Please tell me if you have a list of files. Yann (talk) 20:54, 15 May 2023 (UTC)Reply[reply]
I don't have a specfic list, but there were some suggestions on c:Commons:IA books that Fae didn't take up. If I think of some specific areas I'll let you know. ShakespeareFan00 (talk) 21:10, 15 May 2023 (UTC)Reply[reply]
I've thought of some volumes that should be hosted on Commons:-
The ones listed as external scans in Template:Ruffhead_volumes are a prime candidate.
ShakespeareFan00 (talk) 17:20, 17 May 2023 (UTC)Reply[reply]
@Yann: - I complied a list of "The English Reports" based on work @Technolalia: did, They are also candidates for the Library Backup project :) If you can add to the list on the portal even better. ShakespeareFan00 (talk) 17:39, 17 May 2023 (UTC)Reply[reply]
I will look at the Ruffhead volumes for a start. Where is your list of "The English Reports"? Yann (talk) 19:13, 17 May 2023 (UTC)Reply[reply]
Portal:The English Reports (Some are Google Books entries though...) ShakespeareFan00 (talk) 19:22, 17 May 2023 (UTC)Reply[reply]

Thanks for the post. I want to highlight that the project aims to systematically upload all old books from libraries. This include those work near PD but not yet in PD. They can be deleted after the upload and be restored later.

I think the most imminent threat to library preservation is the Russia-Ukraine war. We should prioritize Ukrainian libraries. If Russian bomb hit Ukrainian libraries the book could all gone.

According to her, the Russians have damaged or destroyed almost 60 Ukrainian libraries since the beginning of the war.


Do any of you know about Ukrainian libraries websites with scans? Please provide.

The next priority would be Russian libraries. If the war escalates, Russia could be the target. I know many people hate Russia, but their books are innocent and need preservation too. --維基小霸王 (talk) 04:45, 22 May 2023 (UTC)Reply[reply]

Tech News: 2023-19Edit

MediaWiki message delivery 00:36, 9 May 2023 (UTC)Reply[reply]


I quick insource: search shows no evident uses of .tipsy onsite. — billinghurst sDrewth 03:36, 9 May 2023 (UTC)Reply[reply]

What is the practice of adding {{Blocked user}} to a user page? There are some in the Category:Blocked users, but it must be just a tiny fraction of all the blocks (including various single use spamming accounts). I am asking as I have noticed that User:Shāntián Tàiláng (who has also been blocked in several wikis) started adding the template to dozens of user pages of blocked spammers (like User:Elvismartin1515). -- Jan Kameníček (talk) 16:37, 10 May 2023 (UTC)Reply[reply]

@Jan.Kamenicek: It typically isn't needed as the bulk of what we block are spambots, and very occasionally an LTA. I would only be using it in a situation where we have blocked someone through community discussion. I politely asked the user to stop adding it, and deleted those labelled pages. We definitely don't need someone just coming in and needlessly (cluelessly?) those tags. — billinghurst sDrewth 11:44, 15 May 2023 (UTC)Reply[reply]
this is a vindictive practice at other projects, to provide a scarlet letter for editors who will never be unblocked. i.e. "all your user pages belong to us". and we see a practice of exporting vindictiveness across wikis. --Slowking4digitaleffie's ghost 18:30, 25 May 2023 (UTC)Reply[reply]

I need an Aux item without auto-centeringEdit

The ToC for History of the Literature of the Scandinavian North does not list the Index, which I need to add to this page.

However, all of the AuxToC elements I can find auto-center on the page, which I do not want. Can someone help? --EncycloPetey (talk) 01:53, 11 May 2023 (UTC)Reply[reply]

@EncycloPetey: You can manually wrap it with the class "wst-aux-content" and then style that from the work's Index styles. That's what I did for The Satyricon of Petronius Arbiter: the toc has manual classes in /17 that are styled by rules in Index:…/styles.css (that, admittedly, I cribbed from CalendulaAsteraceae).
PS. That kind of toc I'd have done with a raw table. It feels a little wrong but it's going to be more robust and easier to do, and it gives you the containers you can hang classes off without adding ugly raw HTML to the page. Xover (talk) 05:51, 11 May 2023 (UTC)Reply[reply]
Under normal circumstances, I'd use a table, but the hanging indents and over-right page numbers led me to use basic formatting. I'd have used a template, but there is no way to use {{dent}} and adjust the right-hand margin}}.
@Xover I'd love to be able to use a manual wrap, but I have no clue how to do that. I can get the light green box to display, but not the notice about not being in the original. --EncycloPetey (talk) 16:01, 11 May 2023 (UTC)Reply[reply]
@EncycloPetey: I led you astray. That trick won't work with a straight up div. I converted it to a table layout just to illustrate how it could be done. Can you use that? Alternately I'll have to try to figure out a way to do it with the div approach, but that'll have to wait until my brain functions. Xover (talk) 18:42, 11 May 2023 (UTC)Reply[reply]
I'd like to avoid using a multi-page table unless it is absolutely required. My experience is that the syntax to create complex multi-page tables changes periodically, at which time well-meaning editors go through and update syntax, causing sections of the table to no longer transclude in the mainspace. --EncycloPetey (talk) 19:51, 11 May 2023 (UTC)Reply[reply]
@EncycloPetey: Note that I'd still recommend using table syntax for this, even in light of their complexity when spanning multiple pages. But I've revert to the original and then found a way to make the Index entry an auxtoc one. It's not exactly elegant, but I think it's a reasonable tradeoff for a once-off. Does it look acceptable to you? Xover (talk) 05:42, 12 May 2023 (UTC)Reply[reply]
I agree and have done mamy tables of that form and have found that the approach with our index:(workname).css has been really useful in this regard. Having the css classes makes the table so much cleaner. — billinghurst sDrewth 10:04, 12 May 2023 (UTC)Reply[reply]
@EncycloPetey: See the work My Life in Two Hemispheres which is that exact form and the implementation Index:My Life in Two Hemispheres, volume 2.djvu/styles.css. Adding the green colour would be the next simple addition. — billinghurst sDrewth 10:13, 12 May 2023 (UTC)Reply[reply]

  Comment @EncycloPetey: It is a while since I have done it, however the green colour used to be in our global styles through subheadertemplate for background-color: #E6F2E6;. The respective header colours should all be readily callable through a global class, though I know that Xover has his arguments against global classes. There needs to be a happy median to allow easy usability. — billinghurst sDrewth 02:45, 12 May 2023 (UTC)Reply[reply]

The problem here is that applying both the background colour to the whole line and the "(not in original)" text requires there to be some structure (html element) that we can target and add it to after. In a table we can just tack it onto the table cell (td), but since this toc is using a div that contains (wraps) both the chapter title and the page number there's no structure there to add it to. The fix is to add that structure; the challenge is doing so without adding too much ugly html salad or interfering with the other formatting. Xover (talk) 05:20, 12 May 2023 (UTC)Reply[reply]

What is the difference between this and WikiBooksEdit

Pease help 15:35, 11 May 2023 (UTC)Reply[reply]

On WikiBooks, they are writing new books. On Wikisource, we are converting previously published books and documents. --EncycloPetey (talk) 16:06, 11 May 2023 (UTC)Reply[reply]

I've implemented {{yesno}} in Lua so it can use the logic of Module:Yesno rather than re-implement it. Test cases are at Template:Yesno/testcases. Thoughts? —CalendulaAsteraceae (talkcontribs) 09:50, 14 May 2023 (UTC)Reply[reply]

@CalendulaAsteraceae: We import both the template and module from enWP. Modifying it locally means we either have to maintain it completely locally, or we need to re-merge every time we resync from upstream (and MW gives us no tools to do that). What's the value proposition to offset that cost? Xover (talk) 16:55, 14 May 2023 (UTC)Reply[reply]
@Xover: That is useful context, thank you. Given that context, I don't think it's worthwhile. —CalendulaAsteraceae (talkcontribs) 20:30, 14 May 2023 (UTC)Reply[reply]

Tech News: 2023-20Edit

MediaWiki message delivery 21:45, 15 May 2023 (UTC)Reply[reply]

A question about project scopeEdit

Does Wikisource consider a text like Five tips for reporting a scam (originally from to be in project scope accd. to Wikisource:What Wikisource includes? This is just one of numerous such web pages uploaded as PDFs to Wikimedia Commons by one specific user (see c:Special:Contributions/StuckInLagToad) and then added to Wikisource. I'm a Commons admin, and if it weren't for the corresponding Wikisource pages, I'd consider such PDFs as out of scope for Commons. --Rosenzweig (talk) 14:29, 17 May 2023 (UTC)Reply[reply]

  • Rosenzweig: I’m fairly active at WS:PD (where scope-related deletion discussion are held), and personally would consider these in scope, although it’s a close call. There is a related discussion about whether publication on government Web-sites is sufficient to be in scope going on right now. TE(æ)A,ea. (talk) 14:57, 17 May 2023 (UTC)Reply[reply]
  • Not really as it is essentially a dynamic webpage without clear authority, and is essentially an extract of part of the website as it isn't standalone work. If it was a webpage elsewhere on the web, we wouldn't so that it is a government page makes little difference. It is a grey zone, and in this space I would also be considering the scope of Commons (educational) for pertinence. That said we took a whole lot of NARA posters and like documents, though they did have a bit more of a historical bent, and are not solely transactional. — billinghurst sDrewth 22:51, 17 May 2023 (UTC)Reply[reply]

Transcribe Text, Preload + Tesserac + Main namespace questionsEdit

  1. Would it be possible for the user to disable the blinking of the Transcribe Text icon, on the toolbar (Vector legacy skin 2010)? I am correcting hundreds of proofread pages and it's distracting, disturbing and interfering. Earlier, I thought that it only blinks when I log in and start editing. Now it's blinking on every page regardless of the status.
  2. Preloading and Tesserac scanning are both slow. I know that these are not caused by computer hardware or my internet speed. Do I have any additional options to speed up the process from its current speed? Is it because of Wikipedia servers? I am looking to explore the reasons before asking my ISP.
  3. Am I permitted to add to each Main namespace page I created, and where is appropriate, the {{default layout}} template with the "4" value? I have no other way to indicate my intention for those who just want to read the page and unfamiliar with the displays. — ineuw (talk) 20:15, 2 May 2023 (UTC)Reply[reply]
@Ineuw: Please present pages where you are having issues. I know of no blinking icon in my transcription work. Preloading what? How? Where? Examples please. First time load? Everytime load? First contributors are able to choose a default layout for a work where it makes sense for the work. It is not meant to be about personal preference but what best suits the work. unsigned comment by billinghurst (talk) .
From what I've seen, the pulsing blue dot only appears the first time you proofread a page in a new browser. Simply logging out and back in doesn't trigger it, but using a new browser (or using your browser's anonymous mode) does. You can clear it by actually clicking on it, but unfortunately that has the side effect of actually transcribing the page you are looking at—undesirable if you are validating (and depending on your style, maybe undesirable in general). But after you've cleared it it should be gone for good, unless, again, you switch browsers or always use private browsing. If it's not, that sounds like a phab ticket to me. — Dcsohl (talk)
14:10, 18 May 2023 (UTC)Reply[reply]
@Billinghurst, @Dcsohl: I didn't answer because the dot disappeared and felt foolish. Now, it's back again, and it pulses when the mouse pointer is near. It seems to appear only on created but untouched/unedited pages. It's not the dot that bothers it's the pulsating. It's quite disturbing to my vision. — ineuw (talk) 13:48, 2 June 2023 (UTC)Reply[reply]



A long time I asked for help to upload a book from Haithtrust: Index:Brazilian short stories. Some user went to help and uploaded the individual pages. But a few days ago I managed to download and upload the PDF rom Google Books: File:Brazilian Short Stories.pdf. Some admin could update the Index page?

On the same author, there's this book translated by Aubrey Stuart. Someone has any clue who he was? Since the English-language text was published in Brazil, I want to be sure that it is really PD right here.

Thanks, Erick Soares3 (talk) 14:11, 18 May 2023 (UTC)Reply[reply]

Yes, as it is published in 1926, it is in the public domain in USA, and therefore OK for Wikisource. Yann (talk) 14:38, 18 May 2023 (UTC)Reply[reply]
Yes, but following what you said at Commons, I also need to be sure that the translation is PD on Brazil, since the book was published at Rio de Janeiro. Erick Soares3 (talk) 16:43, 19 May 2023 (UTC)Reply[reply]
And after doing some research, there's basically nothing on the translator - I'm not sure if I should just assume that he died long enough to be PD on Brazil. Erick Soares3 (talk) 17:14, 19 May 2023 (UTC)Reply[reply]
If it is not in the public domain in Brazil, it could be uploaded to Wikisource. If it is in the public domain in Brazil and in USA, it can be uploaded to Commons. Yann (talk) 18:41, 19 May 2023 (UTC)Reply[reply]
@Yann: the thing is: there's no reference at all of when the translator died, so I'm not sure if it is a case of assuming that he died +70 years ago (would be highly improbably that any heir would complain). Erick Soares3 (talk) 13:35, 20 May 2023 (UTC)Reply[reply]
@Erick Soares3: Umm, we don't publish works here on the basis that someone doesn't complain. If the translation is not in the hoe country and the US, then it cannot be hosted at Commons (their rules); if was published to put it into the public domain in the US then we can host it. — billinghurst sDrewth 15:32, 20 May 2023 (UTC)Reply[reply]
@Billinghurst: my issue is more in the line of what we should do when we don't have enough information about the translator (e.g. when he died). We just don't publish it here? I attempted to research in a Brazilian newspaper archives about him and ended empty-handed. Erick Soares3 (talk) 20:07, 20 May 2023 (UTC)Reply[reply]
@Erick Soares3: English Wikisource reproduces English language texts based solely on US copyright provisions, and it would see that the translated work was published to put it out. With regard to author research, please document what you can on the author's talk page, positive and negative searches. That is what I do and it all helps, especially as reference when populating Wikidata. It also helps us review when we move works to Commons from here. Possibly not for a long time in the case of the identified work. — billinghurst sDrewth 23:50, 20 May 2023 (UTC)Reply[reply]

  situational coment We don't have a good methodology for identifying a more contemporary author where we have no death date and we are hosting a scan of their work. We cannot easily identify when WD may get the requisite death information, to inform us when something may be transferrable. @Xover: can you think of a way that we can label/utilise {{do not move to Commons}} and author death date populate or not. We possibly can run an author capture for all works for which we hold scans, then run a check against a list of authors using Petscan looking at those items in WD for the presence/absence/test of that date. Guessing that such a check is suitable once a year in line with when we do our start of year move to Commons clean-up. — billinghurst sDrewth 23:59, 20 May 2023 (UTC)Reply[reply]

Hmm. Tricky to automate. {{do not move to Commons}} doesn't know who the author is, unless we put a major effort into connecting all our scans → editions → works → authors at Wikidata. Templates/modules are also bad at catching changes (no death date → death date added). If we had machine-readable author information on all our File:s we could probably bot-script a periodic task that lists all files that currently has an author with a death date in the likely range for pma. 70 expiration. If we trust the |expiry= on {{do not move to Commons}} we could then filter out those on the assumption someone has manually checked it already. But at that point I think we're probably better off just manually checking all files with {{do not move to Commons}} which does not have |expiry= set. A tracking category for that should be trivial to add. Xover (talk) 05:41, 21 May 2023 (UTC)Reply[reply]
@Xover: Category:Media not suitable for Commons/not listed (distinct from Category:Media not suitable for Commons/test). —CalendulaAsteraceae (talkcontribs) 04:20, 25 May 2023 (UTC)Reply[reply]

This is just a mess.. Someone needs to actually sit down and repair the relevant citation template, because its completely *****d up in rendering here. ShakespeareFan00 (talk) 20:26, 21 May 2023 (UTC)Reply[reply]

Why does someone need to do that? Document in WS: ns, and one part of a discussion whether to keep that subset of documents or not. You can just ignore it and move on. Don't come stamping your foot about things that show up in error reports, they are just error reports, they are not our boss, they do not set agenda nor priorities. — billinghurst sDrewth 21:31, 21 May 2023 (UTC)Reply[reply]
@ShakespeareFan00: I fixed this instance; it was a pretty simple error. No clue how to stop the bot from doing it in the future though. — Dcsohl (talk)
20:23, 22 May 2023 (UTC)Reply[reply]

I think that this project is a waste of space on Wikisource. Why host it? As it is, it's useless. I downloaded the zip file, in which the book is in single leaves and various formats. This is all image work. I would not suggest recreating the text with our fonts.

If I upload the single pages to the commons, inserting the them as is, is problematic because the pages are in landscape layout. Separate the text as an image, and place it above the images for a portrait layout.

Can anyone suggest what else can be done? — ineuw (talk) 20:26, 21 May 2023 (UTC)Reply[reply]

Here is how a different Book Dash work was done A Beautiful Day and Zanele Situ: My Story and there are other problematic transcluded works like When I Grow Up. MarkLSteadman (talk) 13:32, 25 May 2023 (UTC)Reply[reply]
Ugh. That's rather awkward, yes. I don't think we should add any more works like this, and certainly not encourage adding them. But what to do about the existing ones? Deleting them seems… harsh. But fixing them seems impossible with our current platform functionality (decent webfont support at a minimum, but there's more that would be needed to do it justice).
PS. Just to be clear: I think it'd be awesome if we could host these works. I just don't see any way we could, currently, that isn't doing both the work and our readers an injustice. Xover (talk) 18:10, 25 May 2023 (UTC)Reply[reply]

Text image over art image is what I imagined. Something close to A Beautiful Day. — ineuw (talk) 08:06, 27 May 2023 (UTC)Reply[reply]

Tech News: 2023-21Edit

16:55, 22 May 2023 (UTC)

Page ns104 pages out of sync with Index-Edit

The scans are - File:The Federalist (Ford ed, 1898).djvu

However there seem to be some extant pages such as Page:The Federalist (Ford).djvu/1 and others?

What is the CORRECT index name, so that pages aren't being created under the "wrong" index? Thanks.ShakespeareFan00 (talk) 06:03, 24 May 2023 (UTC)Reply[reply]

Probably because c:File:The_Federalist_(Ford).djvu is a redirect to c:File:The Federalist (Ford ed, 1898).djvu. Every page at Index:The Federalist (Ford).djvu should be redirected. Ignacio Rodríguez (talk) 00:52, 25 May 2023 (UTC)Reply[reply]

Policy against just dumping OCR raw in ns104?Edit

It's already an unwritten guideline that contributors shouldn't just dump raw OCR into Page: namespace, but I am not sure if this had been phrased into a formalised guideline/policy.

What are the thoughts of other contributors? ShakespeareFan00 (talk) 08:26, 25 May 2023 (UTC)Reply[reply]

I believe you're talking about people marking raw OCR text as "proofread" in Page namespace?
In that case, I agree that it's pretty annoying because that's a lot of rework for someone to go through and recheck, and it breaks our workflow if that person then marks it as "validated". I don't mind if raw OCR text is merely saved in Page namespace as I expect someone will end up proofreading it eventually.
But in cases where someone marks raw OCR text as proofread, I think we should have a policy to revert those edits. And a talk page warning to the person doing that. Ciridae (talk) 08:40, 25 May 2023 (UTC)Reply[reply]
@Ciridae: We have fairly clear established standards for using page statuses, and inappropriate page statuses can definitely be reverted and should be taken up with the contributor on their talk page (politely). In farthest consequence we can block people for this, like other non-constructive behaviour, if lesser measures fail to address the problem.
But I'm pretty sure SF00 is referring to filling all the Page: pages in an Index: with raw OCR. We have no policy explicitly prohibiting that, which has led us to have ~1 million such pages (more than 30% of the total number of pages in Page: namespace). And if you don't mind such pages you are, probably, in the minority: I have no empiric data but my experience suggests most contributors do not want to work on texts that are already filled like that. Xover (talk) 18:37, 25 May 2023 (UTC)Reply[reply]
There's no policy against it (otherwise we wouldn't have ~1 million of them sitting around), and there's no strong precedent that they are deleted when nominated. Personally I think we should ban this, both because having such pages sitting there is problematic in themselves and because it encourages bad practices (dump the raw OCR, transclude it, and then just split; nobody wants to work on such texts so they sit there forever in that state making enWS look like a ghetto). But nobody much has expressed support for such a ban so far. I expect they'll come around when the number of raw OCR dumps exceed our number of actually Proofread pages, but then I am an eternal optimist. Xover (talk) 18:23, 25 May 2023 (UTC)Reply[reply]
I was certainly finding in some of the delinting efforts, that more than a few of the 'unproofread' pages were raw dump from OCR. I generally at least try to do a little cleanup on a New page, even if I save it as un-proofread.
Of course match and split pages that haven't been proofread yet, is a different issue, and those SHOULD be retained as typically, there is some kind of standards being applied, even if it's not a direct scan match :)
ShakespeareFan00 (talk) 18:32, 25 May 2023 (UTC)Reply[reply]
the text dumpers will always be with us. it is unclear to me that a policy with warnings and blocks is better than trying to pivot them to our better proofreading practices. a million page backlog is a feature not a bug. --Slowking4digitaleffie's ghost 18:36, 25 May 2023 (UTC)Reply[reply]
Nobody is suggesting warning templates and blocks as a primary feature. But so long as we permit this practice we have no basis on which to ask them to behave differently, much less use a stern tone of voice when doing so. If we prohibit this practice we can simply tell them we don't permit raw OCR here and channel their energies into actually proofreading and perhaps gain a productive long-term contributor in the process. A million page backlog of raw OCR already exists: it's called the Internet Archive. Importing it here will only serve to drown the project in crap and make all our contributors leave in disgust. Xover (talk) 18:49, 25 May 2023 (UTC)Reply[reply]
actually internet archive has 4 million books,[17] so let’s say 100 million pages. So we are a small percentage of scanned pages available. We ran this experiment, where german and english wikisource were the same size in 2008; and here we are 15 years later, and english has 162 times non-proofread pages, but also 5.9 times proofread and 2.3 times validated. non-proofread is flat over the last year, as proofread and validated increase, i.e. not out of control. i would like to keep up with the french increase in proofreading, but that would suggest recruiting more editors. --Slowking4digitaleffie's ghost 17:49, 27 May 2023 (UTC)Reply[reply]
My other concern is that currently there is no way of determining between 'raw' pages on which no effort has been made and those that aren't yet at proofread standard. By saying 'raw' pages aren't acceptable, a non-proofread page should have going forward have had at least some human (or bot) cleanup on it. I will also note that sometimes a 'raw' OCR dump is one of the first things I replace on a Page before proofreading it given that OCR technology has improved over the 15 years of so since Wikisource started. ShakespeareFan00 (talk) 19:13, 25 May 2023 (UTC)Reply[reply]
In general I agree. Luckily we are getting to the point where we at least have a source / scan because tracking down a high-quality scan of whatever particular version was dumped is a huge pain.... At least the headers and footers should be cleared up and a minimum of effort put into cleaning up the nonsense from OCR like a cat walking on the keyboard. MarkLSteadman (talk) 21:00, 25 May 2023 (UTC)Reply[reply]

  Comment The opening statement is very confusing as it doesn't give circumstance. Are we talking about a page that is backed with a scan? Are we talking about the extraction of the text from the pdf/djvu layer? Are we talking about someone doing a paste of text from another source to the the Page: with scan. That clarity would help to make comment.

  1. There is no issue with anyone loading a scan-backed page in Page: ns and marking it as not proofread. Zero. I will regularly do it for biographical works as often I want to be able to set up a search on them so I can find individual biographies to reproduce as needed, rather than a p. 1 to end scenario.
  2. There is a problem when anyone marks a page as Proofread without having proofread it.
  3. There is an issue where people transclude pages that do not exist, and
  4. there can be a problem though not always where they transclude pages with not proofread pages.

With the last two dot points, it is incumbent on us politely talking to that person and explaining to them our processes. My exception to the last dot point, can be in our biographical works, I can proofread a section of a page and transclude it, however, the whole page itself is not proofread, so not had its page status changed. Where these things are problematic and not going to be quickly resolved, then those page creations should be deleted. De-lintering pages should not be our primary concern. They are indicator errors, they should not drive perfectly visible and consistent pages where they exist. That a page appears in a de-linter list should not be a source of criticism here where it is displaying fine. Please do not be overly judgemental about people's so-called behaviour unless it is clearly problematic. We experts should not be pretentious of newbies, simply supportive. — billinghurst sDrewth 08:10, 26 May 2023 (UTC)Reply[reply]

Thank you for addressing the concern I had. My concern was mostly about 'mass' creation of non-proofread Page: namespace's (so there should nominally be a scan), where there wasn't at least some attempt to clean-up some of the
more glaring scan errors or omissions, or even what has been previously described as a 'plain-text' proofread.
I appreciate Wikisource isn't applying the same level of pedantry as Distributed Proofreaders though :)
ShakespeareFan00 (talk) 08:19, 26 May 2023 (UTC)Reply[reply]
If you are seeing Page: ns pages showing up in the linter lists, at various stages of proofreading then you should be asking through a phabricator ticket that the Linter process allows filtering based on page status so you can focus on Status 3 and 4 to fix. You should not be focusing on anything not proofread. It is no different to commons typos that escape the view of proofreaders like "bom" for "born" where I focus my efforts on status 3 and 4 pages, and ignore status 1 and 2. Demand for our project its usability. — billinghurst sDrewth 09:12, 26 May 2023 (UTC)Reply[reply]
We seem to agree where the focus should be. I will certainly consider raising a request for a specfic feature as you suggest, unless there is someone here that would like to create a 'Page-status' highlighter as a local user script, that puts a suitable background-color on the table-cells or links?
In respect of proofread Page: with Linter-errors:-
ShakespeareFan00 (talk) 09:40, 26 May 2023 (UTC)Reply[reply]
In respect of filtering by 'page status' , I'm not holding my breath given that the response to my previous ticket regarding limiting the reporting to 'content' namespaces did not generate the desired functionality requested.
ShakespeareFan00 (talk) 09:40, 26 May 2023 (UTC)Reply[reply]
@ShakespeareFan00 A Page-status highlighter should be fairly trivial to implement, I can take a look at this over the week. Sohom Datta (talk) 14:40, 29 May 2023 (UTC)Reply[reply]
@ShakespeareFan00 User:Sohom_Datta/page-status-highlighter.js is a quick and dirty script that I created to do it. Sohom Datta (talk) 15:40, 29 May 2023 (UTC)Reply[reply]

Selection of the U4C Building CommitteeEdit

The next stage in the Universal Code of Conduct process is establishing a Building Committee to create the charter for the Universal Code of Conduct Coordinating Committee (U4C). The Building Committee has been selected. Read about the members and the work ahead on Meta-wiki.

-- UCoC Project Team, 04:21, 27 May 2023 (UTC)


Scans fail to be displayed in the proofreading extension. I started experiencing the problem a couple days ago, when it often took a long time for the thumbs to be displayed, and I often had to reload the page several times. Now they seem to stop being displayed completely. I only receive a message <ocrtoy-no-text>. -- Jan Kameníček (talk) 11:00, 28 May 2023 (UTC)Reply[reply]

@Jan.Kamenicek: I'm seeing broken image loads too today. I suspect an infrastructure issue and am trying to raise the WMF operations people.
PS. The weird error message you're seeing is from my OCR script. It tries to prefetch OCR on page load, and what you're seeing is a non-localized error text that just means the OCR backend produced no text (probably because it too failed to load the page image). Xover (talk) 11:28, 28 May 2023 (UTC)Reply[reply]
@Jan.Kamenicek: I don't suppose you can pinpoint when the problem started with any more precision? It can help the server admins figure out where the root of the problem is. Xover (talk) 12:11, 28 May 2023 (UTC)Reply[reply]
@Xover: I am sorry, I cannot. It must have been a few days ago when I first noticed the images of scans take longer time to appear, but I thought that it is just due to some problems with connection and so I did not think about it much :-( --Jan Kameníček (talk) 12:56, 28 May 2023 (UTC)Reply[reply]
@Jan.Kamenicek: Can you check now and see if it's better? I'm seeing somewhat slow image loads, but no images actually failing to load altogether, and since image loads are usually quite slow here we're at least within reasonable distance of normal. Xover (talk) 13:26, 28 May 2023 (UTC)Reply[reply]
@Xover: The image appears, than almost immediately disappears, and after some time appears again. So the work is slowed, but at least possible. --Jan Kameníček (talk) 17:21, 28 May 2023 (UTC)Reply[reply]
@Xover: Now it goes well, so hopefully the problem has been solved. Thanks! --Jan Kameníček (talk) 18:19, 28 May 2023 (UTC)Reply[reply]
FYI, I have also noticed slow or no thumbnails for PDF files on Commons, so this is most probably not related to WS. Yann (talk) 15:19, 28 May 2023 (UTC)Reply[reply]

Tech News: 2023-22Edit

MediaWiki message delivery 22:03, 29 May 2023 (UTC)Reply[reply]

Bulk De-linting.Edit

Following the strong reactions, I have received on my User talk pages concerning efforts to repair and reduce the number of LintErrors remaining in Wikisource, I'm abandoning the current effort until there's some kind of guidelines written on how to it should be done responsibly if at all.

I also have one request requiring admin action, Can an admin 'suspend' the AWB permission I have, as I am not sure I actually need that access to continue with normal proofreading/validation efforts? ShakespeareFan00 (talk) 23:50, 29 May 2023 (UTC)Reply[reply]

Speedy deletion notificationsEdit

There is a bold notification in Recent changes about a speedy deletion request, but the Category:Speedy deletion requests is empty... What could be the reason? -- Jan Kameníček (talk) 07:33, 1 June 2023 (UTC)Reply[reply]

It was a file and deleted by Xover. I wouldn't fuss what weird caching is going on. — billinghurst sDrewth 10:29, 1 June 2023 (UTC)Reply[reply]
Yup. It was a file deletion that hit a problem somewhere down in the depths of MediaWiki, which probably means that the category table wasn't updated like it normally is. It'll probably clear up on its own in a couple of days (there are periodic maintenance jobs to clear out such things). Xover (talk) 11:44, 1 June 2023 (UTC)Reply[reply]