Wikisource:Scriptorium

Scriptorium
The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help. Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 400 active users here.

AnnouncementsEdit

ProposalsEdit

CopydumpsEdit

Inspired by the above discussions, I would like to suggest that "copydumps" be added to WS:D#Precedent as they are frequently nominated on WS:PD and are generally uncontested. By "copydumps", I mean works that consist of copy-pasted OCR text, generally including page breaks and so forth - the sort of text that ends up in Category:Texts requiring OCR fixes. It is usually much faster to delete these works and then proofread them from scratch, than to try and clean them up before a match-and-split. There are occasional exceptions to this, which is why I am suggesting to add them as deletion precedent and not as a category for speedy deletion. —Beleg Tâl (talk) 14:31, 25 February 2021 (UTC)

  •   Support if it's just raw OCR, there's basically no benefit. My usual request that a scan is found and linked from the authors/portals to accompany the red links via ext/small scan link. Inductiveloadtalk/contribs 19:33, 2 March 2021 (UTC)
  •   Support --Xover (talk) 18:26, 3 March 2021 (UTC)
  •   Comment If you are talking solely a copy and paste of an OCR text from archive.org, then I always feel we can have a quick review process at WS:PD, rather than a long tortured discussion. At this stage I would prefer that any existing work at least could come through PD for a quick review as there can be link removals, etc. I am always comfortable challenging any new addition like that, though feel that summarily deleting can be disenchanting for a new users, and much prefer to move to their user space and discuss. I definitely don't think that they should be encouraged, nor attempted to be converted, other than straight replacement with transcribed and transcluded work. — billinghurst sDrewth 03:18, 5 March 2021 (UTC)
    • This is exactly how WS:D#Precedent should work. It can't be used for a speedy, but they are a shorthand for a full discussion of repetitive cases. There's still scope within WS:PD proposal for the work to be fixed and kept. The discussion should also not be insta-closed with this, there must be time allowed for responses and or remedial work. If the work is newly-added, working with the contributor directly to remedy issues is much better than slapping them in the face with a deletion proposal at all. I'd hope this is only used in cases of old-and-stale OCR dumps. Dragging any recently-active work to WS:PD because it needs improvement is a last resort when all other avenues of improvement have failed. Inductiveloadtalk/contribs 08:20, 9 March 2021 (UTC)
      • @Inductiveload: The discussion should also not be insta-closed Actually, it could be. The point is notification so that people have a chance to notice and open an insta-undelete discussion. Closed threads are not archived for two weeks or somesuch so there's a minimum time of visibility, and the deletion is findable in the archives if someone later on wonder where a page went. Not that anything is typically closed in any timeframe to which "insta" would be an apposite descriptor, but it does happen (things that are speediable but are brought to PD are sometimes insta-closed). --Xover (talk) 08:43, 9 March 2021 (UTC)
        • I'd certainly hope that nothing that's not long-abandoned is summarily closed without at least a chance for the contributors involved to have a say and/or chance to sort it out. As you say, WS:PD doesn't normally shake out as fast closes anyway, so it's not likely to be a major issue. My point is I don't think being covered by a WS:D#Precedent trumps the usual one-week minimum for discussion.
        • Also, expecting new users to know that, although we just nuked their page, it's not gone-gone and they can get it undeleted with no hard feelings on the proviso that they do X, Y and Z is unlikely to actually result in zero hard feelings all round. Then again, I've been saying in the "articles criteria" section above that using WS:PD as a "hi, please fix this" forum is already an overly adversarial process with a too-strong subtext of "if you don't comply, we're going to nuke it, so shape up, buster". Inductiveloadtalk/contribs 08:58, 9 March 2021 (UTC)
          • I don't think being covered by a WS:D#Precedent trumps the usual one-week minimum for discussion. That is, by design, exactly what it does. That's not to say I don't agree with your cautions about how they should be employed (with great power…), nor that deliberately (mistakes do happen) abusing the process and criteria shouldn't be suitably trouted, but those are behavioural issues.
            On the other hand I also think you're exaggerating the "adversarial" issue: experience shows that we do not in fact have any problem with the situations you describe. Rather the opposite. We have some edge cases in recent history but those would have been adversarial no matter what (for entirely different reasons). The vast majority of what ends up on WS:PD is very old (and its age is itself a complicating factor in those discussions) or is problematic for entirely different reasons. It is a much bigger problem that nearly anything that ends up on WS:PD has a significant chance of someone jumping in claiming they will bring it up to standard and then never really following through (doing just enough to keep it from being deleted, and that's an exceedingly low bar). I have several such on my todo list that I will eventually have to finish myself because bringing them back to WS:PD would really come across as adversarial (and I really don't want the grief). We certainly need to be sensitive to how a deletion discussion feels for those contributors who care about whatever is up for discussion, but that can't be a trump card that prevents much needed cleanup.
            Let's keep things in perspective, is what I'm saying. --Xover (talk) 10:22, 9 March 2021 (UTC)
            • I think we're (as per) on the exact same page, and I am now, (as per) drifting gently off-topic. I was referring to the slightly fractious result of the "incomplete" works discussions, which started with a multiple listing of works added by a single editor on WS:PD, with the "implied threat" of deletion becoming more contentious than the actual issue at hand (OTOH, sadly, none of the work's did get improved, so it probably would eventually have still escalated to a WS:PD entry in this case). As I have said before we don't have an active "Wikiproject Fixup" or "Makeover Taskforce" or whatever (sounds like these are candidates: I have several such on my todo list), so WS:PD is the de facto venue for "remedial action clearly needed, but I don't want to do it myself" entries.
            • For raw-OCR copydumps, and, in particular, old copydumps, WS:PD is the right venue. If it's only just just happened, I'd generally say "first contact" via a talk page is more friendly (but you know that and, I think, that's not what you're talking about). Well-meant copydumps are a fairly common first attempt at a constructive edit. If the user vanishes or refuses to engage in improvement, and it's not something anyone else wants to handle, then WS:PD is the final resort. Inductiveloadtalk/contribs 10:46, 9 March 2021 (UTC)
      • @Billinghurst: As Inductiveload said, "This is exactly how WS:D#Precedent should work." I am not proposing that we change anything about how we handle copydumps or WS:PD; what I am proposing is that we update WS:D#Precedent so that editors will be better informed about the precedent that has already been established on WS:PD for how we handle such works. —Beleg Tâl (talk) 00:54, 31 March 2021 (UTC)
          • "Also, expecting new users to know that, although we just nuked their page, it's not gone-gone and they can get it undeleted with no hard feelings" you should not expect no hard feelings when deleting other editors work, without an attempt to find a scan before deletion. the "i as an admin can still see it" is no defense. go on down the "sword of damocles" road, see how many hard feeling you will create. you need to create a quality circle to fix non scanned backed works. deletion is not a quality improvement process, and is therefore power tripping only.Slowking4Farmbrough's revenge 23:36, 31 March 2021 (UTC)

Ability to Add Formatting Guidelines or Important Information to the Index PageEdit

I think that it would be extremely useful to have the ability to add some formatting help or general guidelines on the Index page itself. I know that we can add them to the Discussion section, but this is out of the view of users. This would save users having to look through the vast help section to just find a tiny bit of information. Instead, we could have a mini-help section on every Index page to deal with any bit of formatting necessary. This would be easy for users to find and make it possible for them to contribute. Languageseeker (talk) 20:39, 7 March 2021 (UTC)

This is done on the Index talk page. See Index talk:Manual of the New Zealand Flora.djvu for one of mine. Then look at the Index itself. Note the banner across the top pointing to the fact that there are formatting guidelines on the talk page. Beeswaxcandle (talk) 05:05, 9 March 2021 (UTC)
Yep, but that's less visible to users. My proposal is to move such guidelines from the discussion page to the main index page for greater visibility to reduce user error and confusion. Languageseeker (talk) 06:37, 9 March 2021 (UTC)
@Languageseeker: Putting all the formatting instructions on the Index page would make rather a mess of the page. Some (not many) works have a lot of notes (e.g. transliteration tables, etc), and are sometimes subject to in-line discussions. Also, the Index pages are "secretly" actually just a big ol' template MediaWiki:Proofreadpage index template with a magical form-based edit interface. Stuffing arbitrary formatting instructions (e.g. tables) into the form fields is going to throw up frustrating edge cases, even if you had a good place to dump them on the page.
I suggest looking at ways to make {{Index talk remarks}} clearer. It's possible (in theory) to add a field to the Index page template, and pass it along to {{Index talk remarks}}, but I'm not sure what we'd put in such a field. Inductiveloadtalk/contribs 08:13, 9 March 2021 (UTC)
Less visible??? There is clear text that says look at the talk page for formatting. If they aren't seeing that then maybe they are not looking. The formatting on the Index: page really has a little bit of room at the top. There is a little scope to add something to the TOC field, as we did with DNB works in the early days though it was horrible for trying to add detailed instruction, hence why we moved to the talk page. — billinghurst sDrewth 11:10, 9 March 2021 (UTC)
I can see it, but even one additional click makes it less probable for a user to see it. Also, they don't always exist. It's easy to forget that some users may not know how to add bold text or italic. If a new users clicks on a transcription project, then they shouldn't have to dig through pages of documentation to find the answer. I'm basically asking for the ability to make a mini help page on every Index page. Languageseeker (talk) 12:43, 9 March 2021 (UTC)
@Languageseeker: but why is the index page the right place for this? If it's stuff like bold and italic, that's general formatting and the ">Help" menu at the top of the edit box on every page contains that. Index talk pages contain special formatting conventions for that index only. BTW, it's on my secret list to figure out how to make our edit box top bar more useful for what we actually do, since it's contains not much of use for WS special sauce. Inductiveloadtalk/contribs 12:50, 9 March 2021 (UTC)
@Inductiveload, @Billinghurst:The index page is the appropriate space because it is the first thing that a user sees when they decide to participate in a transcription project. Based on the Index page, they will decide whether or not to contribute. Right now, the Index page is about as exciting as a library catalog record. I want to increase the motivation of users to participate in a transcription project. As an example, I wrote up what I would like to see on the Index Page here. Ask yourself the question, based on this information are you less likely or more likely to participate?
@Inductiveload: Your custom edit bar sounds awesome. I have some ideas if you want to hear them. Above all, I think that we should be able to customize the edit bar based on the specific book. Languageseeker (talk) 18:19, 9 March 2021 (UTC)
@Languageseeker: For most/many of our works there should be ZERO requirement for special formatting instructions. They typically have been used for more complex multi-volume works, or for where some PotM so that there is uniformity. Having someone increase participation is not through putting formatting instructions on an Index: page. If one click is going to deter someone, then 200 to 500 clicks later for a 200 page book is going to kill them. Do not try to make an Index: or an Index: talk page into a Wikisource:WikiProject page. For complex co-operative works, especially those in a series, a good project page is a better way to go, and from there you can add sections and transclude those sections into relevant Index: talk pages. I would much rather look to encourage project pages, and active project talk pages, and invest in linking Index: and project pages to work better. — billinghurst sDrewth 21:18, 9 March 2021 (UTC)
That said, I think that there is some scope for how we can utilise Template:Index talk remarks and do some simple/additional/introductory text on the talk page of the index that could be transcluded back into the Index: page. We don't want to make it too much or too busy. — billinghurst sDrewth 21:23, 9 March 2021 (UTC)
On that note, what we could do, fairly easily, is add a field to the Index page form and template for "relevant Wikiprojects". I'm not sure where would be best for them on the page (e.g. top right or the main table).
It occurs to me that we could also add a drop down for setting {{index transcluded}} parameters too, rather than stuffing it into the TOC field. Inductiveloadtalk/contribs 21:25, 9 March 2021 (UTC)
@Inductiveload, @Billinghurst: Thank you for the detailed feedback that helped me clarify my thought. I definitely don't want to flood the index page with lots of extra content, but I do think a little help can be useful, especially for the tricky cases. This is the approach that video games take: a bit of guidance helps users to invest hundreds of hours in difficult tasks. I do think that we can provide a bit more information on indexes without cluttering it up.
I that the idea of transcluding from the Index Discussion to the Index Page is a great idea. Perhaps, we could have a set section called Transcribing Guidelines that we could transclude. This section would contain an introduction to the work, it's significance, and any special instructions. We would tell users to keep it short.
I disagree (my opinion only) that the Index: ns is the place to talk to them about the work; Index: and its talk space are solely a workspace and instructional, rather than contextual or informational; those aspects of a work belong in more obvious and in front-facing content namespaces, which for us have been main, WS:, Author:, or Portal:. I would encourage you rethink that approach, they are not a place for bloat. Help:Namespaces gives that direction. — billinghurst sDrewth 01:22, 10 March 2021 (UTC)
@Billinghurst: I'm not asking to add a wall of text. I'm merely asking to add one or two sentences to explain why a user should care about this book and a few special formatting instructions (if any). We can even make the section collapsible to reduce bloat. For more detail information, we can include a site matrix as we do on transcluded works. Look at the current Proofread of the Month Women of the West can you give me one reason that is obvious to the user as to why they should help to transcribe it? We can't just give users a book, tell them to transcribe it, and expect them to get involved. People need reasons to volunteer. Languageseeker (talk) 01:57, 10 March 2021 (UTC)
And I will say it again, the Index: namespace is not the place to do it, it is not joe-public front-facing namespace, you are already too deep. I have no issue with all the encouragement, excitation, interest, etc. and it belongs before you get to an Index: ns. Keep it simple. — billinghurst sDrewth 04:01, 10 March 2021 (UTC)
That seems like a fairly elitist attitude. Every user is a new user at some point. Where are they supposed to learn? Languageseeker (talk) 05:36, 10 March 2021 (UTC)
What?!? You are trying to do multiple things, and think that the Index: page is the panacea. I am disagreeing with your proposal.

Every new contributor has a welcome message that gives them all the basics, and should lead them to our general approach, and anything general. We have agreement and general practice that where this is specific formatting that may be specialist to the work that it belongs on the Index talk: page. However, all explainers about the work, its place in a corpus, etc. do not belong on the Index: page, they belong in our other namespaces where they are more visible and more organised, and I have explained why. So please stop the throwing of insults. We fix up the issues where they lie, not poke them all into an Index: page. As I mentioned previously, we have used the ToC field for some commentary about works, eg. DNB volumes. — billinghurst sDrewth 09:50, 10 March 2021 (UTC)

@Billinghurst: I’ve been feeling bad about my poor choice of words. You didn’t deserve them at all. Please, accept my apologies. Languageseeker (talk) 20:22, 12 March 2021 (UTC)
If we take a cue from video games, an "in-page tutorial" makes more sense to me. Like when websites flash a little "helpful" (I find them rather a speedbump, but they must be useful or they wouldn't be used) marker over "new" items. You are fundamentally right, on-boarding and new-user-support, as well as complete but accessible documentation, is hugely lacking, and we must improve it.
BTW, The main "portal" for basic instructions is Help:Beginner's guide to Wikisource. Inductiveloadtalk/contribs 09:31, 10 March 2021 (UTC)
Absolutely agreed, PGDP include similar information on their project pages and it works. We cannot assume that users will know before they stumble on an index page. Even including a link to proofreading guidelines on index pages would be a huge help. We really need to make index pages less stark. I think there’s also fear of redundancy between headers and index page. Can’t we pull more information from the info on index pages rather than manual filling in headers? Languageseeker (talk) 20:00, 10 March 2021 (UTC)


New Request for Comment on Wikilinking Policy is openEdit

I have just opened Wikisource:Requests for comment/Wikilinking policy. You will find there a proposed complete overhaul/rewrite of the current policy, which is now ready for review by the wider Wikisource community. It is proposed that the RfC will be open for two weeks. Please make your comments there rather than here. Beeswaxcandle (talk) 08:33, 14 March 2021 (UTC)

@Beeswaxcandle: I think 2 weeks / 72 hours is a little bit too aggressive, even for a presumed uncontroversial policy proposal like this. I understand the reasoning, but I just don't think the community is able to move that fast. For example, we have several long-time contributors that are currently in a phase where they check in only every couple of weeks. And I know for my own part that the local Covid status could easily make me too busy to check in here for weeks on end. We could still have an accelerated timeline (just not quite as accelerated as 2/72) if we notify of the proposal in an site notice and maybe even a talk page message to any established contributor that has been active in the last three months (or similar).
PS. And let me repeat my previous private kudos in public: you took my ongoing whining about the old policy and turned it into a concrete proposal for a new policy. Great work, for which I am extremely grateful! --Xover (talk) 09:25, 14 March 2021 (UTC)

Tweak archive settings for the ScriptoriumEdit

Currently the configuration for automatic archiving of the Scriptorium is set to archive threads in which there has been no new comments for 30 days, and to archive threads which are explicitly marked as resolved 31 days after the date they are marked as resolved. This means that in practical effect nothing ever gets archived by being marked as resolved.

In order to have the ability to clean out this sometimes a bit overwhelmingly long page I propose we change the interval for resolved sections to something more reasonable like a week, or possibly even 3 days. We rarely explicitly close threads here, and when we do it's the "Hey, how do I do this? / Here's how. / Ok, thanks."-type threads. Conversely, the threads that really need long-term visibility are either marked with the "do not archive until" tag (which lets you set an arbitrary future date before which the thread is ineligible for archiving) or, for things like the RFCs and proposals, once the discussion in the "Proposals" section is closed they should be posted as an announcement in the "Announcement" section where they will stay for an additional minimum of 30 days (the ordinary auto-archive interval).

Absent indications to the contrary my expectation is that this proposal is uncontroversial, so if there are no comments on this proposal I will take that as tacit approval. If anyone has any concerns with it I would appreciate comments to that effect, or even just "Wait, I have to think about it first". --Xover (talk) 09:17, 29 March 2021 (UTC)

And since nobody objected or yelled "Wait!", I've now tweaked the settings accordingly. I'll leave this thread open for a bit for stragglers, and after that anyone that wants to tweak the settings further can just open a new thread. --Xover (talk) 19:27, 9 April 2021 (UTC)

Moral disclaimers for certain worksEdit

There are certain works that have a core message or consistently incorporate certain themes that most people would find offensive and morally reprehensible. I'm thinking specifically about works that were made for the purpose of promoting white supremacy. Some notable examples of these are: Thomas Dixon's The Clansmen and The Leopard's Spots; D.W. Griffith's films The Birth of a Nation (1915) and Intolerance (1916); Henry Ford's The International Jew (1920); Adolf Hitler's works; etc.

I think works such as these definitely need to be transcribed here, so that they can be viewed for historical purposes (as in, to understand what their arguments were and why they were made), and a transcription could for example make it easier for a user of our content to produce a rebuttal to said work. But the issue is that works like these are so bigoted in tone that their messages are simply indefensible, cruel, and morally reprehensible. I imagine many people who read our transcriptions of those works may get the idea that Wikisource's community, or the users who took the time and effort to work on the transcription, actually support the bigoted messages of these works, despite what Wikisource's project pages say about the project being NPOV.

So I propose that we create a disclaimer template, that we can put in the "Notes" section of the front matter page's header template. The template should say something to the effect of:

This text consistently promotes ideas that are particularly hateful or bigoted in nature. Please remember that Wikisource's community and its contributors do not necessarily endorse any opinions or ideas presented in any of its works, including this one. Works are presented as-is with no censorship involved, as transcription is done with a neutral point of view in mind, without bias for or against any particular ideology.

By the way, I think the disclaimer should only be included in works that have a consistently disreputable tone that may easily cause offense. I don't think that works such as Bobbie, General Manager or The Achievements of Luther Trant which casually dropped the n-word in a few times, but don't bring up racial issues much at all, should be given the template. However, a book focusing primarily on racial issues, taking a white supremacist stance, would qualify. PseudoSkull (talk) 19:12, 8 April 2021 (UTC)

I very much appreciate the underlying issue, but I'd be inclined not to do this. While it would have benefit for the most extreme and uncontroversial cases, such as those you list, there would be a tremendous number of works in a "grey area" where editors disagree, and/or where we lack the resources to even detect or evaluate subtle but reprehensible views.
Perhaps an alternative would be to put some careful work into a thorough essay along the lines you suggest, and link to it from somewhere prominent on the Wikisource main page. Rather than trying to attach it to every reprehensible work, simply express our position clearly in one central place.
I always think it's worthwhile to think about the precedent of traditional libraries. Would you expect to find your local library had inserted a position statement into its copy of Mein Kampf? It seems unlikely, though I could certainly see them having a general brochure available at the front desk explaining why they carry such works. -Pete (talk) 19:32, 8 April 2021 (UTC)
@PseudoSkull: Just to tie my comment a little more closely to your proposal, and focus on how things would play out in practice: How would you imagine things going if somebody strongly disagreed with you, and felt that Bobbie, General Manager was indeed reprehensible? (I have no familiarity with this particular work, just following your example.) How would we come to a decision? Would the process tend to deplete the time or emotional energy of various volunteers? Would the end result, regardless of what it is, bring much benefit to the reader? -Pete (talk) 20:12, 8 April 2021 (UTC)

Bot approval requestsEdit

InductivebotEdit

The following discussion is closed and will soon be archived:
Bot flag granted.

Hi! Could I please request the bot flag back for User:InductiveBot? I'm starting to thing about making a fix for the {{TOC begin}} family and that might need a bit of bot finagling to remove things like blank lines that will cause issues after the fix is made.

Also I'd like to use it for general maintenance task, moves, replacements, etc., like it used to do 10 years ago.

None of the tasks it would run are run constantly, they're started manually and supervised. Inductiveloadtalk/contribs 17:09, 8 January 2021 (UTC)

  •   Comment Just noting that since this is a request for reactivation of a previously approved bot—and one with an extremely low potential for controversy or disruption at that—the bot policy allows for an abbreviated approval process rather than a full minimum 4 days discussion + 7 days trial period. It does require the flag be granted by a `crat though (so ping Hesperian and Mpaa). original (2010) authorisation, 2013 confirmation, and I think the flag was removed when we purged the inactive bot accounts in 2017 or thereabouts but I couldn't be bothered to dig it up just now. --Xover (talk) 09:47, 9 January 2021 (UTC)
Flag set. I did not wait long, given the history of Inductiveload and their bot here. In case of disagreement, please continue the discussion and the outcome will be considered, as per process.Mpaa (talk) 18:31, 9 January 2021 (UTC)
Thank you! Inductiveloadtalk/contribs 23:47, 10 January 2021 (UTC)
  This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 18:31, 3 March 2021 (UTC)

Repairs (and moves)Edit

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

Arabella MoveEdit

The following discussion is closed and will soon be archived:
Moved, to the right volume.

Could someone move Index:The female Quixote, or, The adventures of Arabella (Second Edition V2).pdf to Index:Arabella (Second Edition - Volume 2).pdf. The source file for the first Index was deleted on Commons. 01:19, 5 April 2021 (UTC)

The deletion of the source file should not require a move. We simply upload a local version of that file with the same name as the original file, and everything is corrected. We have admins here with the ability to do this. No rename or move is required. --EncycloPetey (talk) 01:24, 5 April 2021 (UTC)
Never got moved or replaced. Languageseeker (talk) 23:39, 7 April 2021 (UTC)
Clarification. I think the pages had text in them that needs to be moved. Languageseeker (talk) 19:57, 8 April 2021 (UTC)
@Languageseeker: The pages associated with that index were from Vol. 1, not Vol. 2. Was that perhaps the reason the file was deleted at Commons? I'm guessing then that the moved Index: was also for Vol. 1, making even that move rather pointless. In any case, after a detour through Vol. 2, the pages have now been moved to the correct place under Index:Arabella (Second Edition - Volume 1).pdf. --Xover (talk) 15:44, 9 April 2021 (UTC)
@Xover: Oops, I confused myself with bad naming. Appreciate you fixing this and my mistake. Languageseeker (talk) 18:47, 9 April 2021 (UTC)
  This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 19:04, 9 April 2021 (UTC)

Move Oliver Twist to Oliver Twist (Boz Issue)Edit

The following discussion is closed and will soon be archived:
Declined. We do not preemptively disambiguate page names, and the page name is not the primary way we identify the relevant edition.

This text is specifically the Boz Issue of Oliver Twist. I'm asking to move it to Oliver Twist (Boz Issue) to disambiguate it. Languageseeker (talk) 19:57, 8 April 2021 (UTC)

@Languageseeker: When you proofread another edition of Oliver Twist we'll move this one to disambiguate them and leave a versions page in its place. In the mean time it's just fine where it is. --Xover (talk) 19:06, 9 April 2021 (UTC)
  This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 19:07, 9 April 2021 (UTC)

@Xover, @Peteforsyth: Just to be clear, the current plan is this

  1. Change main link on Author Page to Title Versions, e.g. Oliver Twist Versions
  2. Create Version Page.
  3. For the appropriate edition, create a redirect to Title
  4. Add version information to Title

In the future,

  1. Change main link on Author Page to Title
  2. Move Title to appropriate page
  3. Move Title Version to Title page
  4. Remove redirect

Am I understanding this correctly. Languageseeker (talk) 00:26, 10 April 2021 (UTC)

I am clearly stupid, as you are not making sense to me. Please do this:

  1. Transclude your new version, and add a {{other versions}} to the top point to "Oliver Twist"
  2. Put its information into a new WS item, list as an edition of Oliver Twist (Q164974)
  3. Link to it on from the author page
  4. Come back here and identify the existing work that needs to be moved, and identify the new version

We will fix up the rest either alone or with assistance. — billinghurst sDrewth 01:15, 10 April 2021 (UTC)

  • @Languageseeker, @Peteforsyth: I can't quite grasp where it was that you took a wrong turn, but you're out in the weeds here.
    Neither Oliver Twist versions nor Oliver Twist (Charles Dickens Edition) should exist (and will be deleted as soon as the information in them have been verified to be preserved on a suitable page). The information currently in them belong on Author:Charles Dickens, or possibly on a Portal: of some stripe if the inclusion criteria or organisation conflicts with the goals of an Author: page. Iff there is ever a Proofread transcription of what you call the "Charles Dickens Edition" transcluded onto a suitable mainspace page, then we'll move what you call the "Boz Edition" (i.e. what is currently on Oliver Twist) to a suitable disambiguated name, and then change Oliver Twist into a versions page. That versions page will contain a listing of, and links to, any Proofread edition that we currently host. It will not contain a complete listing of editions that exist of a work, nor a selection of the most significant editions, or… Just the ones we actually have proofread; either fully or in rapid progress towards completion.
    But the bottom line here is that so far there is not another proofread edition of this work, or at least not one that you have identified. So right now you're just making a mess and creating extra work for others over a hypothetical future proofread edition that may or may not ever materialise. I suggest you put your energies towards proofreading that other edition instead. --Xover (talk) 13:53, 10 April 2021 (UTC)

Please remove wikilivres from header templateEdit

The site has been dead for years and is not coming back (I no longer own the old domain anyway). —Justin (koavf)TCM 04:39, 10 April 2021 (UTC)

@Koavf:   Done --Xover (talk) 13:16, 10 April 2021 (UTC)

Other discussionsEdit

PD-anon-1923 againEdit

The discussion of Happy Public Domain Day! has slipped into the archives without getting into some conclusion, so I would like to remind that the last suggestion in the above mentioned discussion was to create {{PD-US|year of death}} and deprecate {{PD/1923}} and {{PD-anon-1923}}. Is this solution OK?

BTW: if we decide to keep calling the license templates for pre-1925 works {{PD/1923}} and {{PD-anon-1923}}, it would be necessary at least to adapt the latter one so that it could be used for 1924 anonymous works too. --Jan Kameníček (talk) 16:21, 20 February 2020 (UTC)

  Support the change — I don't really care but it makes sense —Beleg Tâl (talk) 16:36, 20 February 2020 (UTC)
  •   Support likewise —Nizolan (talk) 01:54, 21 February 2020 (UTC)
  •   Oppose because the name emphasizes US. The point of the templates is to cover both US status and international status. A template that names the US will cause confusion, especially to newcomers. --EncycloPetey (talk) 02:02, 21 February 2020 (UTC)
    @EncycloPetey: So under your opinion, fixing a math wrong do even require consensus? Without consensus we should believe 1+1=3 rahter than 1+1=2? --Liuxinyu970226 (talk) 01:37, 1 April 2020 (UTC)
    Changes to established templates require consensus. We've had previous discussions and the community is divided on the issue concerning these templates. Proceeding with a change when the community has expressed such division is inappropriate because of the community discussion, not because of my opinion. --EncycloPetey (talk) 02:05, 1 April 2020 (UTC)
  •   Support. We are US-centric in our copyright approach. Given the number of times I've had to look up these type of templates here and on Commons, I might buy the idea that we should copy them, but otherwise, I think this is going to be as non-confusing as we get.--Prosfilaes (talk) 04:35, 21 February 2020 (UTC)
  •   Comment In your proposal, how do we code the year of the author's death for anonymous works? --EncycloPetey (talk) 04:38, 21 February 2020 (UTC)
    I am afraid I do not understand the question: anonymous works do not have any known author. I propose that for anonymous works we would have a template with similar wording as {{PD-anon-1923}}, but it would be called {{PD-anon-US}}. --Jan Kameníček (talk) 09:42, 21 February 2020 (UTC)
    That's also problematic, because the US is just one place that we display license information for. The current template displays that information for both the US and for countries with 95 years pma. --EncycloPetey (talk) 19:46, 21 February 2020 (UTC)

  Comment If there is a consensus to act, my recommendation is that we just move/rename the templates

  • pd/1923|yyyy -> PD-US|yyyy, yyyy=YoD, displays two templates as now
  • PD-1923 -> PD-US, where no $1 parameter it displays the one template
  • PD-anon-1923 -> PD-anon-US|yyyy, year of publication

and update the documentation around the place. Do any internal required tidying around internals of templates, and fixing double redirects. No need to deprecate anything, just move to the new nomenclature, and not worry about any of the old usage, or anyone continuing its use, as it matters not. — billinghurst sDrewth 11:15, 21 February 2020 (UTC)

  •   Oppose Firstly, because of the US emphasis. Yes, we follow US copyright law, but we also serve an international readership, not to mention contributors who are also bound by the copyright laws of other countries. Secondly, I think replacing "PD-1923" with "PD-US" is confusing. "PD-US" sounds like a generic template for "this work is PD in the US", but under this proposal it would mean "this work is PD in the US for the specific reason that it was published more than 95 years ago". BethNaught (talk) 22:16, 21 February 2020 (UTC)
    I do not understand in what way "the readership" is concerned in this… They see only the text of the template which is going to stay the same. --Jan Kameníček (talk) 23:08, 21 February 2020 (UTC)
      Comment I do not think that the suggested name of the template is more American-centred than the old one. E.g. {{PD/1923|1943}} has got two parts: "1923" is the American part referring to the American copyright laws, and the parameter "1943" is international referring to the countries where PD depends on the year of death. Nothing would change, only the American part would be called "US" instead of the nowadays non-sensical 1923, I really do not see any problem in that. --Jan Kameníček (talk) 23:08, 21 February 2020 (UTC)
    @BethNaught: The thing is that the only consideration we give to copyright compliance with regard to hosting is to the US copyright. Unlike Commons, we don't really care whether it is copyright in the country of origin. It is for this reason that I am reasonably comfortable with just stating PD-US and variants. The additional PD-old-70 and variants are for information only. — billinghurst sDrewth 00:43, 22 February 2020 (UTC)
  •   Comment I think this is an important issue, and I'd like to weigh in. I'm probably as familiar as (almost) any Wikimedian with the considerations around copyright law in various countries. But I do not see a clear statement of what the problem is that we're aiming to solve, or what the pros and cons are. I'm sure if I took an hour or two to dig through various archives, I could probably figure it out, but I'm not likely to have the time for that...nor should we expect every voter to do that. So given all that, I'm inclined to gently oppose, simply because I can't figure out what's going on, and it seems unwise to make a change that is difficult for community members to evaluate. Is it possible to sum up the issues more concisely so that I can give it more proper consideration, without having to do all the research myself? -Pete (talk) 22:44, 21 February 2020 (UTC)
    The problem I see is this: Until 1923 it made quite a good sense to have a template called PD-1923, because it referred to the fact that only pre-1923 works are in the public domain. However, the situation has changed, currently the time border is 1925-01-01 (or 1924-12-31) and it shifts every year. I perceive it as very confusing to call the template for pre-1925 works PD-1923 (why 1923???). At the same time it does not make sense to change the name of the template every year (PD-1923, …, PD-1925, …), it would be better to find a fitting universal name. --Jan Kameníček (talk) 23:16, 21 February 2020 (UTC)
    Ah, that's very helpful @Jan.Kamenicek:, thank you. I had misunderstood, I thought you were proposing a change to the functionality in addition to the name change.
    I agree that changing the name (a) such that it specifies "US" and (b) such that it references the 95 year rule, rather than the (now outdated) 1923 rule would be worthwhile. I agree with others that we should be cautious about US centrism; but the reality is, with a current title that assumes that it relates to US law, without stating it, we already have a high degree of US centrism in the title. In my view, it's better to state "US" as part of the name, to make it clear to editors (who are the primary audience for a template name) that it's about US law. So, my suggestion would be {{PD-US-95}} or similar. That conveys that it's about US law, and it's about the 95 year rule. Text on the template page/docs could clarify that the 1923 rule is now outdated, and subsumed under the 95 year rule.
    A related issue that I find confusing: I don't understand why we need two separate templates for {{PD-1923}} and {{PD/1923}}. I think this proposal only relates to the latter; would we be leaving PD-1923 intact? A decision on this is probably a matter for a separate discussion, but I'd like to know for sure what the intent of this proposal is. -Pete (talk) 23:45, 21 February 2020 (UTC)
    PD-1923 has no decision-making applies just a single template, it does not add the PD-old-nn variants. It has been utilised where we have been unable to determine a date of death, or for corporate publications which do not have PMA decisions. I addressed above that they would morph into PD-US, though we would need to handle them as parameterless. — billinghurst sDrewth 00:51, 22 February 2020 (UTC)
    Jan, that's not quite correct. Works published before 1923 are still in PD in the US for the same reason they were before. The 1923 date was a cutoff date beyond which we have never had to check. What has changed is that works that were under copyright later than that (from 1923 and 1924), and had their copyright renewed at one point, have now had that copyright protection expire. The works published before 1923 were not eligible for renewal and entered PD for a different reason than the works published in 1923 and 1924. It is one view to see the date as a shifting cutoff, but the cause of works from 1923 and 1924 entering public domain is actually different from those that were published prior to 1923. --EncycloPetey (talk) 03:13, 22 February 2020 (UTC)
    All works published more than 95 years ago are out of copyright because of the time since publication, no matter whether that's due to copyright notices, or renewals, or being in copyright for a full long term. For a work published before 1923, we've never been concerned about copyright notices or renewals, nor how long work published with copyright notice and renewal got in copyright. Why does it matter that a work published in 1924 may have got 95 years of copyright, whereas a work published in 1922 may have only got 75, when we don't really care about that 95 or 75 in the first place? We have no tag for "published abroad before non-US works got copyright in the US in 1891", because we don't care; it has always been sufficient for our purposes to say that it was published before 1923, and I don't see why it is not now sufficient to say that it was published more than 95 years ago.--Prosfilaes (talk) 04:59, 22 February 2020 (UTC)
    @Prosfilaes: I am presuming that this is in reference to the primary notice about copyright within the US, not the secondary notice for PD-old-nn which relates to copyright elsewhere in the world. The secondary notice can still apply for those of us not in the US, which is why we added it. — billinghurst sDrewth 05:08, 22 February 2020 (UTC)
    Yes, the primary notice. There's no need to worry about now-historical features of non-US countries, but certainly helpful to list the years since death.--Prosfilaes (talk) 05:18, 22 February 2020 (UTC)
    Yes and no. There are authors who have works published prior to 1925 who died late enough to still have works in copyright in their home country, so those notices are still very pertinent per Category:Media not suitable for Commons. — billinghurst sDrewth 05:30, 22 February 2020 (UTC)
    Right; I didn't mean to imply we should change the current secondary notices.--Prosfilaes (talk) 06:42, 22 February 2020 (UTC)
  •   Support U.S. copyright is of primary concern to Wikisource. Fixing the license so more 1923 and 1924 works appear on Wikisource even if still under copyright in other countries is so important. Abzeronow (talk) 19:46, 16 March 2020 (UTC)
  •   Support as this seems like the least problematic solution to the problem, and it doesn't make sense for us to keep delaying a resolution. Kaldari (talk) 18:09, 14 April 2020 (UTC)
  •   Comment It looks as though some people are hedging their bets: arguing for deprecating the template on the one hand but arguing for improving the template on the other. Since the template content has now changed, before this discussion has concluded, then proceduraily we should recast all votes, since the template named in this discussion thread no longer has the content it had at the start of this discussion. --EncycloPetey (talk) 20:42, 24 April 2020 (UTC)
    Hedging their bets? It is somehow improper to try and improve Wikisource for now, whether or not this template gets deleted? If we're going to get pedantic about policy, where is it written on the English Wikisource that we should recast all votes?--Prosfilaes (talk) 06:41, 25 April 2020 (UTC)
    No need to restart the votes, as the changes have been reverted. The template is the same as it was before the voting started. No changes should be made to any template if there is a discussion and voting ongoing about its future. If the changes were allowed and at the same time we would have to restart the voting after every change, we may never come to a conclusion; not everybody has time to vote about the same problem again and again. --Jan Kameníček (talk) 09:50, 25 April 2020 (UTC)
  •   Support If there must need a consensus to fix math wrongs, let it be. --Liuxinyu970226 (talk) 09:01, 7 May 2020 (UTC)
  •   Comment Please note that the new date, 1925, applies to all works except sound recordings (and maybe architecture). The date for sound recordings is 1923. That isn't shown in the local summary of the Hirtle chart, but is in the original. (I dropped a more detailed comment below.)--Sphilbrick (talk) 14:29, 20 July 2020 (UTC)
    Interesting point. If it is really so and if we need to show a license for sound recordings somewhere, we would probably have to create a specialized template for them.--Jan Kameníček (talk) 11:44, 2 December 2020 (UTC)
    Yeah. Sound recordings have a tortured history in US copyright law, but the end point is that the first recordings to have their copyright expire in the US will be in 2022, for those published before 1923. See w:Public_domain_in_the_United_States#Sound_recordings_under_public_domain.--Prosfilaes (talk) 00:51, 3 December 2020 (UTC)

So it seems to me that there is a weak consensus for the change. If so, it might be better to make it before the end of the year, so that works newly entering public domain can already be added with new templates.

The less important change is renaming the templates from {{PD/1923|year of death}} and {{PD-anon-1923}} for {{PD-US|year of death}} and {{PD-anon-US}}. It is only a change of the names of the templates, what the readers see will not be affected by this.

The more important change is adapting the latter one so that it automatically counted the years as {{CURRENTYEAR}}-95, similarly as it has been done e. g. here.

--Jan Kameníček (talk) 11:44, 2 December 2020 (UTC)

Looks like an interesting, but a very long, discussion. Is there a way for a newbie to get involved without spending hours and hours? Thanks in advance, Ottawahitech (talk) 18:24, 10 December 2020 (UTC)

I have updated {{PD-anon-1923}} and moved it to {{PD-anon-US}}, and also moved {{PD-1923}} to {{PD-US}}, per discussion above. However, {{Pd/1923}} is locked and so I asked it to be moved to {{PD/US}} at the Admins' noticeboard. --Jan Kameníček (talk) 14:20, 26 December 2020 (UTC)

  This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. nothing new has been identified, I think we can call this complete — billinghurst sDrewth 01:20, 10 April 2021 (UTC)

Policy on substantially empty worksEdit

[This is imported from WS:PD, where it applies to multiple current proposals, and several other works].

We have quite a few cases of works that are "collective" or "encyclopaedic" in that they comprise many standalone articles of individual value, which are basically just "shell pages", with no substantial content of any sort, not even imported scans or Index pages. For example, and this isn't intended to make any statement about these specific works, they're just examples and they may well get some work done soon during their respective WS:PD discussions:

Based on the usual rate of editing for things like that, unless dragged up into a process like WS:PD, they'll remain that way a very, very long time. I think it is perhaps there might be a case to host a mainspace page for this work, even though there is zero, or almost zero actual content. Do we want:

  • Mainspace pages where this is a tiny bit of information like header notes, scan links and maybe detective work on the talk page (not in this case). This provides a place for people to incrementally add content. Also gives "false positive" blue links, since there is actually no "real" content from the work itself, or
  • Do not have a mainspace page until there's some content. Only host this in terms of scan links author/portal scan links, much like we do for something like a novel.

Personally, I lean (gently) towards #2, but with a fairly low bar for how much content is needed. Say, Indexes, basic templates, a title page and one example article. Ideally, a completed TOC if practical, especially for periodical volumes/numbers. It is fair to not wish to transcribe entire volumes of these work, it is fair to not want to import dozens of scans when you only wanted one, it is fair to only want an article or two, but it's not fair, IMO, to expect the first person who wants to add an article to have to do all the groundwork themselves, despite having been lured in with a blue link. That onus feels more like it should be on the person creating the top-level page in the first place.

I do see some value in periodical top pages with decent lists of volumes and scans where known, because these are often tricky and fiddly to compile from Google books/IA/Hathi, so it's not useless work, even if there are no imported scans (though imported is better than not).

We currently have a large handful of collective works listed for deletion right now in various levels of "no real content", and, furthermore, every single periodical that gets added can fall into this situation unless the person who adds, so I think we could have a think about what we really want to see here. Inductiveloadtalk/contribs 15:43, 3 July 2020 (UTC)

  • I believe that, if there is no scan as an Index: page, the main-namespace page should not exist unless it is being actively completed or is already mostly completed. A few pages (of the volume itself) is not very helpful, and is entirely useless if their is no scan given. TE(æ)A,ea. (talk) 15:59, 3 July 2020 (UTC).
  • I think such preparatory information would ideally be on more centralized WikiProject pages (for the broad subject), both for clarity and to assist in keeping different efforts consistent -- but that it certainly should be retained as visible to non-admins. I think that the red vs blue link issue is minor (but not totally negligible) and outweighed by the disadvantages of hiding the history of previous efforts. I strongly encourage redirecting such pages to appropriate WikiProject pages (after copying over the details there). JesseW (talk) 18:11, 3 July 2020 (UTC)
  • @JesseW: I agree that history shouldn't be deleted, but I think we should approach this in terms of what we want to see from these works, rather than what to do with the handful of examples at PD. There are hundreds of periodicals we could have but don't, and this applies to those as well. If we can come to a conclusion about what is and isn't wanted, we can make all the deletion requested works conform to that easily enough. Inductiveloadtalk/contribs 20:55, 3 July 2020 (UTC)
  • I think these pages are necessary to list index pages and external scans of multi-volume works (such as encyclopaedias and periodicals) especially if they are wholly or partly anonymous or have many authors or are simply large. I think it makes no difference whether such pages are in the mainspace, the portal space or the project space (except that it is harder to find pages outside the mainspace). The point is that these works often have so many volumes (often dozens or hundreds) that they must have their own page, and cannot be merged into a larger portal or wikiproject. If the community starts insisting on index pages, what will happen is the rapid upload of a large number of scans for the periodicals that already have their own page. Likewise if the community insists on transclusion. I also think it is reasonable to have a contents page in the mainspace, as it allows transclusion of articles. Most importantly, new restrictions should not immediately apply to existing pages that were created before the introduction of the restrictions. This is necessary to prevent a bottleneck. James500 (talk) 23:55, 3 July 2020 (UTC)
move the works to a maintenance category, and i will work them; delete them and i will not: i find your sword of Damocles demotivating. Slowking4Rama's revenge 01:55, 5 July 2020 (UTC)
@User:Slowking4: I am not proposing a sword of Damocles. I agree that the imposition of deadlines is counter-productive. I do not support the deletion of any of these pages. I would prefer to see them improved. James500 (talk) 04:38, 5 July 2020 (UTC)
TEA is on his usual deletion spree. not a fan. will not be finding scans to save texts, any more. he can do it. Slowking4Rama's revenge 00:15, 6 July 2020 (UTC)
The entire point of moving this here, and not staying at WS:PD is to decouple from the emotions that get stirred up in a deletion discussion. Let's keep deletion out of this. If we come up with some idea of what we do and don't want, then we can go back to WS:PD and decide what to do. I imagine that all that will be needed will be a fairly limited amount of housework to bring those works up to some standard that we can decide on here, and all the collective works there will be easy keeps. Hopefully with some kind of consensus that we can point at to outline a minimum viable product for such works going forward. There are hundreds and thousands of dictionaries, encyclopedias, periodicals and newspapers that we could/will, quite reasonably, have only snippets of. How do we want to present them? What, exactly, is the minimum threshold? Let's head of all those future deletion proposals off at the pass, because deletion proposals often cause friction. Inductiveloadtalk/contribs 00:47, 6 July 2020 (UTC)
and yet deletion is the default method to "motivate" quality improvement. i reject your assertion that "emotions get stirred in a deletion discussion", rather, anger is a valid response to a repeated broken process being kicked down on the volunteers. it is unclear that a minimum threshold is necessary, rather a functional quality improvement process is. until we have one, you should expect to see this periodic stirring of emotions, as the non-leaders act out. Slowking4Rama's revenge 11:53, 9 July 2020 (UTC)
@Slowking4: Thank you for presenting this opinion, and I'm sorry if I have not made myself clear. We do need to figure out how to avoid a de-facto process of using WS:PD as an ill-tempered ad-hoc venue for "forcing" improvements on people who have somehow managed to generate works that are so in need of improvement that another user has nominated them for deletion. Please also consider looking at #Re-purpose_WikiProject_OCR_to_WikiProject_Scans for an idea to have a "functional quality improvement process" to which such works could be referred upon discovery rather than kicking them straight to WS:PD. If you have other ideas or you have previously suggested something similar to address these frustrations, you could detail them there. Personally, I think we should always prefer improvement over deletion. Exactly what the remediation is (refer to a putative WP:Scans, WS:Scriptorium/Help, directly WS:PD as now, or something else) is not what this thread is for. This thread is for discussing, what, if anything, should be the tipping point for deeming a page "lacking" and doing something about, whatever "something" is. I don't think I can be much clearer that this is not about deletion. If we also have a better venue for improvements, then that's even better.
For example, my personal feeling and !vote on A Critical Dictionary of English Literature is "keep and improve", despite it lacking scans or even links to scans, having only one article and no other content, not even a title page: in short, failing almost every criterion suggested so far in this thread. The only thing it does have is have is good text quality of the one entry. I personally do not think this work should be deleted, but I do think it should be improved in specific ways. The first half of that sentence is not the focus of this discussion, the second half is. Inductiveloadtalk/contribs 14:18, 9 July 2020 (UTC)
deletion threat has been an habitual method of communicating by admins since the beginning of the project. and text dumps have been habitual following in the guttenberg example. culture change and process change would be required to change those behaviors. we could may it easier to start scan backed works, but the wishlist was not supported. Slowking4Rama's revenge 21:00, 14 July 2020 (UTC)

I don't think this needs to be much of an issue going forward -- we all agree that it's OK to create Index pages for scans, even if none of the Pages have been transcribed yet; so the only case where this would come up is recording research where no scan has yet been identified as suitable to be uploaded. And for that, I still think a WikiProject page is the right location, not mainspace. (Or, if you must, your userpage.) JesseW (talk) 00:59, 6 July 2020 (UTC) I realized I may not have been clear enough here -- in my view, the ideal process goes like this:

  1. Decide on a work you are interested in (in this case, a periodical/encyclopedic one) -- don't record that anywhere on-wiki (except maybe your user page)
  2. Find and upload (to Commons) a scan of one part/issue/etc of the work.
  3. Create a ProofreadPage-managed page in the Index: namespace for the scan. (You can stop after this point, without worry that your work will later be discarded.)
  4. EITHER
    1. Put further research (on other editions, context, possible wikification, etc.) on that Index_talk page.
    2. Proofread a complete part of the scan (an article from the magazine issue, a chapter from the book, a entry from an encyclopedia, etc.) and transclude it to the mainspace (and create necessary parent pages), and put the further research on the Talk: page of the parent mainspace entry.

If you can't find any scan, and don't want to leave your working notes on your user page, put them on a relevant WikiProject's page.

If you come across such research done by others and misplaced, follow the above process to relocate it to an appropriate place, then redirect the page where you found it to the new location. That's my proposal. JesseW (talk) 01:08, 6 July 2020 (UTC)

@JesseW: It's not clear to me in your above whether when you use the term "index" you refer to a ProofreadPage-managed page in the Index: namespace, or a general wikipage in the main namespace on which an index-like structure (and/or a ToC, or similar) is manually created. Could you clarify? --Xover (talk) 05:14, 6 July 2020 (UTC)
I meant the namespace. Clarified now. JesseW (talk) 05:17, 6 July 2020 (UTC)
  • Hoo-boy. Y'all sure know how to pick the difficult issues…
    My general stance is that: 1) scans and Index: (and Page:) namespace pages have no particular completion criteria to meet to merit inclusion, and can stay in whatever state indefinitely (there may be other reasons to get rid of them, but not this); and 2) the default for mainspace is that only scan-backed complete and finished works that meet a minimum standard for quality should exist there.
    That general stance must be nuanced in two main ways: 1) there must be some kind of grandfather clause for pre-existing pages; and 2) there must exist exceptions for certain kinds of works that meet certain criteria. I won't touch on the grandfather clause here much, except to say I'm generally in favour of making it minimal, maybe something like "No active effort to get rid of older works, but if they're brought to PD for other reasons they're fair game". The design of a grandfather clause for this is a whole separate discussion, and an intelligent one requires analysis of existing pages that would be affected by it. It is always preferable to migrate pages to a modern standard, so a grandfather clause is by definition a second choice option.
    Now, to the meat of the matter: the exceptions…
    We have a clear policy to start from: no excerpts. Works should either be complete as published, or they should not be in mainspace. But quite apart from the historical practices that modify this (which are somewhat subjective and inconsistent, so I'll ignore them for now), there are some fairly obvious cases that suggest a need for more nuance than a simple bright-line rule alone provides. The major ones that come to mind are: 1) massive never-completed projects like EB1911 or the New York Times (EB because it's big; NYT because new PD issues are added every year); 2) compilations or collections of stand-alone works with plausible claim to independent notability.
    For encyclopedias and encyclopedia-like things, we have to accept some subsets due to sheer scale of work. But when that is the grounds for exception, there needs to be some minimum level of completion. I'm not sure I can come up with a specific number of pages/entries or percentage, but it needs to be more than just a single entry (and, obviously, only complete entries). For this kind of exception to apply, I think it needs to be a requirement that the framing structure for it is complete: that is, the mainspace page should give a complete overview of the relevant work even if most of it is redlinks. That includes title pages and other prolegomena when relevant. For a periodical like the NYT, that means complete lists of issues with dates and other such relevant information (e,g. name changes etc.). For preference, these kinds of things should be in Portal: namespace or on a WikiProject page until actually complete, but that will not always be practical (EB1911 and NYT are examples of this). Mainspace or Portal:-space should never contain external links (i.e. to scans) or links to Index: or Page: space (except the implied link of transclusion and the "Source" tab in the MW UI provided by ProofreadPage).
    For exception claimed under independent notability there are a couple of distinct variants.
    Newspaper or magazine articles need to have a certain level of substance in addition to a specific identifiable byline (possibly anonymous or pseudonymous, and possibly identified after the fact by some other source, such as the Letters of Junius) in order to qualify. It is not enough to ipso facto be a newspaper article, a magazine article, a poem, or an encyclopedia entry. On the one hand we have things like dictionaries and thesauri, where an entry could be as little as two words. Or a one-sentence notice without byline in a newspaper. Or two rhymed lines (technically a poem) within a 1000-page scholarly monograph.
    To merit this exception it should be reasonable to argue that the "work" in question should exist as a stand-alone mainspace page (not that we generally want that; but as a test for this exception, it should be reasonable to make such an argument). This would clearly apply to moderately long entries in the EB1911 written by a known author that has their own Wikipedia article. It would apply to short stories or novella-length serialisations in literary magazines by authors that have later become famous (or "are still …"). It would apply to various longer-form journalistic material from identifiable journalists (again, rule of thumb is notable enough for enWP article), including things in magazines that have similar properties. For most periodicals the most relevant atomic (indivisable) part is the issue not the entry or article, but with some commonsense exceptions.
    It would, generally, not apply to things that are works by a single author, like a scholarly monograph that just happens to be arranged in "entries" rather than chapters. It would not apply to things that are essentially lists or tables of data. It would not apply to short entries in something encyclopedia-like or entries that are not by an identifiable author. The OED for example, iirc, is a collective work where entries are by multiple not individually identifiable authors (and each entry is mostly very short too); only the overall editor is usually cited.
    For works claiming this exception too the framing structure should be complete, even if most of it are redlinks. The same general rules about Portal:/WikiProject and no external or Index:-space links apply. An exception would be for periodicals where new issues enter the public domain every year; and we should generally avoid including even redlinks for the non-PD issues here (but may allow them in a WikiProject page). For non-periodical works in multiple volumes where some volumes were published after the PD cutoff, including listings for the non-PD volumes (but not links to scans; those are a copyvio issue) is ok.
    Poems, short stories, and novellas are a special class of works here. A lot of these were first published in a magazine (possibly serialized), and a lot of them exist as multiple editions in substantially the same form. Some exist in multiple versions. These should all primarily exist the same way as chapters as part of their various containing works; but there are some cases where we might want to have, for example, a series of connected pages of the poems of Emily Dickinson. I am significantly ambivalent about this practice, as it amounts to making our own "edition" or "collection" of her poems (in violation of several of our other policies), but I acknowledge that it is an established practice and it is something that has definite value to our readers. It may be that it is actually a practice that should be governed by its own dedicated policy rather be attempted to be handled within these other general policies.
    For the sake of example; applying this to the works Inductiveload listed at the start of this thread would shake out something like this:
    Auction Prices of Books—This work appears to have no sensible subdivisions and is in any case by a single author. I see no obvious reason to grant this work an exception, except under sheer volume of work and even there I would want to see both a substantial proportion completed and some kind of ongoing effort towards completion (no particular time frame, but definitely not infinite and definitely not as an effectively abandoned project). In a deletion discussion I would very likely vote to delete the mainspace pages here (but, as nearly always, to keep the Index: and Page: namespace artifacts). I don't see this as a reasonable candidate for a Portal:, nor really a good fit for a WikiProject (though I probably wouldn't object to a WikiProject if someone really wanted one).
    Central Law Journal/Volume 1—A single volume is too little, so I would want to see a complete structure for the entire Central Law Journal, with level of detail for each volume similar to the one existing volume. Each article in the journal can be individually considered for a stand-alone work exception; but for the collection I would want to see at minimum a full issue finished to justify having the mainspace structure, and preferably multiple issues (in a deletion discussion I might insist on multiple issues). Index: and Page:-space artefacts can, of course, stay. A Portal: might make sense for selections from the journal, of articles that meet the standalone work exception. A WikiProject to coordinate work and track links to scans etc. might be a decent fit here, if someone wanted that. As it currently stands I would probably vote delete for the mainspace artefacts (with option to move whatever content has reuse value to a non-mainspace page for preservation; and undeleting if someone wants to work on something is a low bar).
    A Critical Dictionary of English Literature—The top level mainspace page has near-zero value, existing only to link to the single transcribed entry. For a credible claim to exception to exist it would need to be a complete framework for the work as a whole, and significantly more than a single entry must be complete. I would probably also want to see ongoing work, unless a substantial percentage of the entries were complete. The single finished entry is eligible to claim a standalone work exception, but I think it probably would not meet my bar for that (I might be wrong; and the rest of the community might judge it differently). In a deletion discussion I would probably vote to delete all the mainspace artifacts here (as always keeping Index:/Page: stuff) but with a definite possibility that I might be persuaded on the one completed entry (an absolute requirement for convincing me would be to scan-back it: as a separate issue, my tolerance for grandfathering of non-scan-backed works is small, and effectively zero for new/non-grandfathered works).
    Bradshaw's Monthly Railway Guide—Would need a full framework and a number of individual issues finished to merit a mainspace page. I see no credible subdivisions for a standalone work exception, but might be persuaded otherwise if, say, one of the train tables was used as a (reliable primary) source in a Wikipedia article (implying some sort of notability beyond just being raw data). In a deletion discussion I would probably vote to delete all mainspace artifacts here. If anyone made the argument, I would entertain the notion that there is value in treating train tables like poems, and hosting a series of train tables like we do Dickinson's poems; but that would require a substantial number of them completed.
    For everything above my stance is nuanced by a willingness to accept temporary exceptions for things that are actively being worked: active being operative, but with no particular deadline to complete the work. We have differing amounts of time available, and some works are so labour-intensive or tedious to do, that my person threshold for "active" is a pretty low bar to clear. If it's months and years between every time you dip in and do a bit I might start to get antsy, but days or weeks probably won't faze me. And that the projected time to completion is very long at that pace is not particularly a problem so long as it is not infinite. Within those parameters I would always tend to err on the side of letting contributors just get on with it in peace, regardless of any of the policy-like rules sketched above.
    I also want to emphasise that I think this is a very difficult issue to deal with. There are a lot of competing concerns, and a lot of grey areas that will likely take individual discussions to resolve. My balance point on this issue is partly formed by a broader concern about our overall quality (we have waay too many works of plain sub-par quality, and too many not up to modern standards) and a hope that by preventing the creation of these kinds of works (rather than deleting them after creation) we will be able to retain the good and desirable exceptions without dragging down quality, and without the traumatic and stressful events that deletions and proposed deletion discussions are.
    And for that very reason I am grateful this issue was brought up here for discussion, and I hope we can end up with some clear guidance, possibly in the form of a policy page, going forward. And in any case, since it will create de facto policy, this is a discussion that needs to stay open for a good long while (there are several community members that have not yet commented whose opinion I would wish to hear before closing this), and depending on how well we manage to structure the consensus, may also require a formal vote (up in the #Proposals section). --Xover (talk) 09:03, 6 July 2020 (UTC)
  •   Oppose. It is becoming clear that a policy on incomplete works in the mainspace is going to place enormous pressure on individual editors. I think it would be more effective to start a wikiproject devoted to scan-backing works that lack scans and so on. James500 (talk) 12:14, 6 July 2020 (UTC)
    • @James500: FYI, this thread was made in order to provide an exception to the current policy of "no excerpts". A literal reading of the policy as it stands has a plausible chance of coming down delete on the mainspace pages over at WS:PD. This thread is a chance to come up with a better way to support such partial collective works. That we have several substantially incomplete and abandoned collective works lolling around in mainspace is actually the result of laxity in respect to stated policy (not to say I think it's a bad thing). The deletion proposals, whatever you may think of them, are actually not in contradiction to policy. That said, as always, there is scope to adjust policy. Which is what this is.
    • Now, in terms of a WikiProject to scan back works, I think that is a good idea. See #Re-purpose_WikiProject_OCR_to_WikiProject_Scans above, which proposed to reboot Wikiproject OCR as a scan-backing Wikiproject. Inductiveloadtalk/contribs 14:40, 6 July 2020 (UTC)
      • The policy says "When an entire work is available as a djvu file on commons and an Index page is created here, works are considered in process not excerpts." A literal reading of that policy is that no scan-backed work is an excerpt (it is expected to be completed eventually). Further the policy refers to "Random or selected sections of a larger work". A literal reading of that expression is that it does not include lists of scans, or auxilliary content tables, as they are not "sections" (they are not part of the work), and that not every incomplete portion of a work is either "random or selected" (which would not include starting from the beginning and getting as far as you can, with intent to finish later). I could probably argue that an encyclopedia article or periodical article is a complete work. James500 (talk) 15:16, 6 July 2020 (UTC)
  • Nice wall of text, Xover (and I say that with great respect!) -- it generally makes sense and sounds good to me. As another hopefully illustrative example, take The Works of Voltaire, which I've been digging thru lately. I think this would very much satisfy your criteria as a large work, with sufficient scaffolding to justify the mainspace pages that exist for it. I would love to hear others thoughts on that. JesseW (talk) 16:07, 6 July 2020 (UTC)
    @JesseW: Yeah, apologies for the length. Brevity is just not my strong suit.
    The Works of Voltaire probably qualifies on sheer scale of work, yes. I don't think the current wikipage at The Works of Voltaire is quite it though: as it currently stands it is more WikiProject than something that should sit in mainspace (its contents are for Wikisource contributors, to organise our effort, not our readers, who want to read finished transcriptions). It also mixes a work page with a versions page in a confusing way. So I would probably say… Move the current page to Wikisource:WikiProject Voltaire; create a new The Works of Voltaire as a pure versions page, linking to…; The Works of Voltaire (1906), that is set up as a work page with the cover and title (and other relevant front matter) of the first volume, and an AuxTOC (and possibly also the {{Works of Voltaire}} volume navigation template). I don't know how tightly coupled the volumes of this edition are (does the first volume have a common ToC or index of works for all the volumes?), so some flexibility on format may be needed to make sense. But as a base rule of thumb it should start from a regular works page and deviate only as needed to accommodate this work (mainly the size is different).
    In any case… With a volume or two completed (they're only ~350 pages each) I'd be perfectly happy having something like that sitting around. With less then that I'd possibly be a bit more iffy, but it's hard to put any kind of hard limit on that. And with somebody actively working on it I'd be in no hurry whatsoever regardless of current level of completion.
    PS. I'm pretty sure a large proportion of the contents of these volumes are works that would qualify under "standalone works" that could exist independently in mainspace, regardless of what's done with the The Works of Voltaire page. Even his individual poems and essays can presumably make a credible claim here (because it's Voltaire; less famous authors would have a higher bar). Better as part of the edition, but also acceptable on their own. --Xover (talk) 16:56, 6 July 2020 (UTC)
  • @JesseW: I personally take no issue with this page's existence (actually I think it's a nice work and good way to allow an important author's works to be slotted in piece-by-piece. I have some general comments which overlap with this thread (written before Xover's reply, so pardon overlap):
    • First off, I differ with Xover in terms of the scan links: I think they're better than nothing, and I don't see much value in duplicating the volume list onto an auxiliary page just to add scan links. However, I can sympathise with the sentiment that our mainspace shouldn't direct users off-wiki (or at least off-WMF). But if we don't have the scans, and that's what the user wants, they're leaving anyway. Real answer: import moar scans!
    • No scan links are necessary where the volume exists in mainspace and is scan-backed (e.g. v3)
    • Ext scan links should only be used when there is no Index page or imported scan. Use {{small scan link}} or {{Commons link}} when possible (e.g. v2)
    • The first volume list could probably be in an AuxTOC to mark it out as WS-generated content.
    • The "Other editions" section belongs on an auxiliary namespace page (Talk, Portal or Wikisource). I suggest the Talk page is best in this case. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
  • @Xover: I am in agreement with the majority of what you say. Particularly, I think a framework around any collective work (be it a single-volume biographical dictionary or a 400-issue literary review spanning 80 years) is the critical prerequisite, plus at least some scans, the more the merrier. Where I think I differ:
    • I am inclined to be a bit more relaxed in terms of how much of a work we need. As long as a single article exists, it's not "trivial" (e.g. only a short advert or some incidental text like a "note to correspondents", as opposed to an actual article), it's well-formatted and scan-backed, and a complete framework exists, including front matter and a TOC, such that's it is easy for anyone to slot in new pieces, I'd be fairly happy. Lots of periodicals have all sort of tricky bits like tables of stocks or weather tables and writing into policy that those must be proofread in order to get the "real" articles into mainspace would be a chilling effect, in my opinion. If you allowed an exception, it would be verbose and tricky to capture the spirit without saying "unless, like, it's totally, like, hard, man".
    • I am not dead against scan links in the mainspace at the top level, when such a top-level page exists. See my comments on Voltaire above. I am against them where they could sensibly be on an Author page and they are the only mainspace content.
    • I am ambivalent on the presence of, e.g., disjointed train timetables. It's not my thing to have a smattering of random timetables, but as long as they're individually presented nicely, it's not too offensive to my sensibilities. I might question the sanity of someone who loves doing tables that much, but whatever floats the boats! Also, I think that this might circle back to "good for export" - a mark which certainly would require completed issues or volumes. If you want to get that box ticked, you have to do it all.
    • Re the "notability" aspect of individual articles, I'm not really bothered by that, as I don't think we'll see a flood of total dross because few people really want to take the time to transcribe 1867 articles about cats in a tree from the Nowhere, Arizona Daily Reporter, and, actually I think some of the "dross" can be quite interesting in a slice-of-life kind of a way (always assuming well-formed and scan-backed). And the real dross is usually so bad (no scans, raw OCR, etc) that it can be dealt with outside of this topic. I think part of the value of WS is the tiny, weird and wonderful, not just in blockbusters like War and Peace and Pultizers. I think I might like to see more of our articles strung together thematically via Portals, but that's another day's issue. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
      • @Inductiveload: We appear to be mostly in agreement. But… instead of me dropping another wall of text on the remaining points of disagreement, maybe that means we're in a position to try to hash out a draft guidance / policy type page with the rough framework? Then we could go at the remaining issues point by point. Because I think I'm in with a decent chance to persuade you to my point of view on at least some of them, but this thread is fast getting unwieldy (mostly my fault). It would also probably be easier for the community to relate to now, and much easier to lean on in the future. --Xover (talk) 18:31, 6 July 2020 (UTC)
        • @Xover: If there are no more comments forthcoming after a couple of days, I think that makes sense. I don't want to railroad it: considering we have at least one !vote for "do nothing", I'd like to see if there are any other substantially different opinions floating about. Inductiveloadtalk/contribs 17:41, 7 July 2020 (UTC)

The quantity of text here has grown far faster than my ability to absorb it, so rather than continue to put it off, here's my position: I don't see any problem with transcriptions that are scan-backed, even if the transcription only covers a small fraction of the entire scan. If Sally chooses (say) to transcribe a favorite story, that happened to be published in an issue of Harper's back in the 1890s, and goes to the trouble of uploading the full issue, but only creates pages for the one story that interests her, I think that's great. It doesn't matter to me whether she intends to work on the other pages or not. If it's not scan-backed, but it's fairly high quality, I am personally willing to do some work trying to locate a scan and match it up to the text; I'd rather we take that approach, than deletion, though of course deletion is the better option in some cases where the scan is very hard to come by.

If all this has been said above, or if I've misunderstood the topic, my apologies. Please take this comment or leave it, as appropriate. -Pete (talk) 02:00, 8 July 2020 (UTC)

Apologies, I see I had missed the point.

I disagree with Xover's statement that a top-level page for a publication, with a link only to a single article within the publication, has "near-zero value." Such a page can serve an important function linking content together in ways that help the reader (and search engines) find the content they're looking for, or understand the context around it. For instance, A Critical Dictionary of English Literature is linked from the relevant Wikidata entry. The banner on the Wikisource page clearly tells a Wikisource reader that they won't find a full transcription here; and with a simple edit, it could link to a full scan on another site, or (with perhaps a little more effort) even transcription links here on Wikisource. This page has been here since 2010; we don't have any way of knowing what links might have been created elsewhere in the intervening decade. (I do think that new pages like this should not be created without a scan at Commons to be linked to.) -Pete (talk) 02:12, 8 July 2020 (UTC)

I'm really bad with walls of text, so I have only read a tiny portion of the above discussion. But I want to mention a couple of things that I think are worth considering in this discussion.
  • Most of the time, a mainspace "work" that is only a table of contents, but which has none of the actual content, and is not actively being worked on, can be (and should be) deleted as No meaningful content or history under our deletion policy.
  • A mainspace work that has only a little bit of content, but that content is a work unto itself within the scope of Wikisourse, should be kept. Most periodicals are like this. For an example, see the Journal of English and Germanic Philology which only has one hosted article, but that hosted article is scan-backed and firmly within scope.
  • On some occasions, empty mainspace works do have value. I ended up creating the page The Roman Breviary, depsite containing no actual content, mostly because there are a lot of works that link to it, using many different titles, and if someone uploaded a copy of the work under one title then many of the links would remain red because they point to different titles of the work. This could be easily solved by creating redirects to a simple placeholder page, so I did. I tried to make the placeholder page as useful as a placeholder page can be, as it contains useful information about the history and authorship of the work, and links to the Index pages where the transcription will take place.

Anyway those are my 2 cents, sorry if they are redundant —Beleg Tâl (talk) 00:40, 29 July 2020 (UTC)

ProposalEdit

Since there has been no extra input for a month, and not wanting this section to get archived without at least attempting a proposal, I have started a proposal #Collective work inclusion criteria above. Inductiveloadtalk/contribs 11:00, 25 August 2020 (UTC)


I've created Bradshaw's Monthly Railway and Steam Navigation Guide (XVI) - it couldn't be done on one page, due to the very high number of template transclusions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 1 September 2020 (UTC)

HarmonizingEdit

Is it worth harmonizing these: Category:Obituaries in The New York Times by adding in the year, or taking out the year in the one with a year? And removing the word "obituary" from the ones with it? --RAN (talk) 06:42, 5 February 2021 (UTC)

@Richard Arthur Norton (1958- ): It seems that the preferred format for New York Times articles (at least when they are backed by a scan) is to include the full date if it is known, as The New York Times/YYYY/MM/DD/Title. If no scans back them, they should at least be listed on Portal:The New York Times with complete information so that they can eventually be migrated to a scan-backed version (a cursory view of pages in the obituary category seems to show that they're listed on the portal page). As long as all the articles are accounted for, I don't think it really matters if they all follow the same convention or not, as New York Times article coverage is pretty spotty anyways. Personally, I'd move the one with a year in its title back to its old location and leave it for the time being. -- Mathmitch7 (talk) 16:00, 2 March 2021 (UTC)
@Mathmitch7: What do you think of removing the word "Obituary" from the titles? I don't see that anywhere else, and as I look at the actual scan, it is not present. I think the editor added it so it is recognized as an obituary, but we have the category giving the same information. Also, have you noticed that there a half dozen different ways that newspapers are aggregated? If you go to Category:Newspapers of the United States and click on a few, you can see six different ways that articles are aggregated, there are manual bulleted lists, automated aggregated lists, manual sortable tables, there are empty calendar indexes like New York Post, and a few one-off experiments. There are lists and charts with annotations and summaries of the articles, and ones with just titles. Do you have an opinion on what is optimal, or should we let people keep experimenting for a while? Article titles have no standard, some have years, some have full dates. Articles themselves are a mixture of djvu files, jpg index pages, raw unformatted ASCII text, and formatted Unicode/HTML text. --RAN (talk) 21:25, 2 March 2021 (UTC)
  • @Richard Arthur Norton (1958- ): I think removing the word "Obituary" makes sense except in cases where it is apparently the article title in the text itself. One thing that I recently came across was some old discussion on a proposed redirect policy, which points out that as wikisource is a source for other wiki projects like Wikipedia, moving (particularly moving without leaving a redirect) can create issues when those projects depend on, transclude, or link our source text. This seems like an issue that could especially come up wrt obituaries. Not that we're responsible for WP's links being accurate, but something to keep in mind as we contemplate this kind of change. I say move the pages to the same title without "Obituary" for now and leave other moves (to dates subpages, eg) for when somebody (perhaps, you) takes it upon themselves to clean up NYT articles more generally. -- Mathmitch7 (talk) 16:25, 3 March 2021 (UTC)
I didn't even think of cross wiki linking. I don't even think there is a way to search for it. My way of linking involves only linking to the Wikidata entry for the news article that is stored here. The articles titles stored at Wikidata get changed during the move automatically. So I would link to [[wikidata:Q105939027|Bishop of Mombasa is Dead]], Qids are stable. From the Bishop of Mombasa's entry at Wikidata I would link Described_by_source=Q105939027. --RAN (talk) 02:50, 13 March 2021 (UTC)
  • Our newspapers and periodicals are a mess, and in desperate need of tidying up, but there is no universally accepted "best" paper to use as a model. I think harmonising the titles (particularly moving articles to a subpage of their parent work, rather than floating untethered in mainspace) is a good first step as at least they are all together. Advice from Wikisource:WikiProject Newspapers is Newspaper Name/YYYY/MM/DD, but usage of this is spotty. The use of Portals for article content needs to be standardised too, but that's a far bigger job. Inductiveloadtalk/contribs 09:01, 3 March 2021 (UTC)

On future texts I'd like to work onEdit

Afternoon, fellows. In case you're keeping track, it's been a month and a half since I came back to WS—after an eight-year lull induced by Google+ and whatever else. (Already did a Thornton W. Burgess story about an otter family on the final weekend in January; spent more than a fortnight on an 1880s Malagasy grammar [because Madagascar isn't that well-represented here]; and at this writing, have begun transcribing that 18th-century history on my homeland by Thomas Atwood [because coverage of the Nature Island here is similarly all but nonexistent].)

That said, there are no fewer than three works presently on my wanted list, all of which I've tried to track down with the help of my home county's inter-library loan service. In order of publication, along with status checks:

Honorable mention to...

  • Happy Jack (1920, Thornton W. Burgess): Which (and whose original cover) has surely seen better days—because what GBooks and Hathi have got has clearly shown its age at the dawn of the 2020s, not helped by the more or less dismal quality that plagued early scans of theirs. (Originally announced on my user page as one of the four titles marking Burgess' début at WS; The Adventures of Jerry Muskrat has since replaced it in my queue.)

To Xover (talkcontribs), SurprisedMewtwoFace (talkcontribs), Jan.Kamenicek (talkcontribs), and anyone else interested: If you've got anything further on the matter, then please ping back. Take care, see you in the backlists, and happy Ash Wednesday. (Which a lot of folks outside the Catholic movement are hardly aware of.)

"Never ends for this Captain, does it?"

Slgrandson (talk) 17:16, 17 February 2021 (UTC)

@Slgrandson: The University of Illinois at Urbana-Champaign (who provide lots of scans to the IA/HT) have a copy of Fairies in the rare book stacks. They might throw you a bone with their digitisation service considering it's not a very common book (only 3 entries at Worldcat, all in the US) (or they might want their palms crossing with silver like the NLS). Inductiveloadtalk/contribs 17:51, 17 February 2021 (UTC)
@Slgrandson: I would be happy to help you with Dr. Dolittle's Circus once you upload the scan of it. I know one of the Gatsby's is a reprint, so I don't think we'll have too much of a problem even without a first edition. I look forward to working with you on it! (SurprisedMewtwoFace (talk) 19:49, 17 February 2021 (UTC))
@SurprisedMewTwoFace: I already got an ILL copy of Circus—and am likely to begin scanning with my SD card tomorrow. Can confirm it's a 1950s reprint. (Just a reminder so that the archiving bot doesn't catch it yet.) —Slgrandson (talk) 01:25, 18 March 2021 (UTC)
If you want a first edition, several libraries near me have it but it might take me a bit to get them, take pictures with my phone etc. MarkLSteadman (talk) 00:32, 18 February 2021 (UTC)

Removing references to Wikisource in WikipediaEdit

While I'm at it, I have removed the reference to Wikisource "The Barbarism of Slavery" from w:Charles Sumner. WS's text is completely undocumented, has no authority, and users of Wikipedia are not helpfully referred to it. They would be better referred to archive.org, where at least you can see the publication information for what you are looking at.

Many texts exist in multiple, differing versions. There's only one version of "The Barbarism of Slavery"? OK, but say so. Otherwise WS is unreliable.

Undocumented texts like this are worse than useless. They are outright harmful, because they give a misleading impresion that sometthing reliable has been created. I'm not doing this systematically, but I will continue deleting links in Wikipedia to undocumented texts in Wikisource. I think by doing so I am helping WP users.

I had exactly the same argument with Project Gutenberg when it started. Udocumented transcriptions of texts create more problems than they solve. Deisenbe (talk) 10:08, 24 February 2021 (UTC)

@Deisenbe: If you are following enWPs processes, especially with consultation at w:Wikipedia:Reliable sources/Noticeboard, then I am not certain that we would have an issue if you can demonstrate that the work is unreliable. enWP is enWP and they have their guidance for editors to follow. The early additions here are early additions, and they are not how we would do a work today; that said if we don't have evidence that a work is truly problematic, they stand as they are as unsourced, supplied transcripts. Please utilise {{no source}} if a work has no source; please use {{fidelity}}if you think that the transcript is an issue. If you think that a work is truly problematic then consider nominating it for deletion at WS:PD. — billinghurst sDrewth 10:59, 24 February 2021 (UTC)
Obligatory soapboxing: this is why we need to start raising our quality bar, and most critically to start requiring works to be scan-backed and Proofread. Deisenbe puts it more directly than most, but the points are well made and valid regardless of whether most reusers of our texts are able to articulate them. Our balance is far too far in the direction of contributor convenience and preserving sunk cost (expended effort) no matter what, and we need to start shifting it toward better quality and a higher bar if we're ever going to make any appreciable progress (our backlogs are growing by the day). --Xover (talk) 12:18, 24 February 2021 (UTC)
this is what they do at german wikisource, and we are ten times their size (proofread pages), and widening the gap. is that what you want? Slowking4Rama's revenge 15:59, 25 February 2021 (UTC)
While it's obviously suboptimal to have unsourced versions kicking about, they're still better, IMO, than no version (unless there are actual concerns about fidelity, which is extremely rare). In this case, the source text is evidently the 1863 edition: Index:The Barbarism of Slavery - Sumner - 1863.pdf. Of course, the original IP contributor should have made a note of the edition and provenance back in 2007, and failing that, should have been prompted for one at the time, and {{no source}} applied until such source was provided.
In my opinion, it's more constructive to report such issues here and request research be done to isolate the exact edition and/or scans (or do it yourself if so inclined), than to just delete the links from enWP. As a community, we are generally pretty amenable to making our works useful to enWP, but due to immense backlogs of unsourced junk in the mainspace (driven by the mismatch between how easy it is for a new user to drive-by with a dump and the effort and learning curves of doing it right <insert moan about lack of documented expectations>), we simply can't fix everything up front. But, if something is specifically reported to us because it's linked to by enWP and needs a bit of digging, I don't think I'm overstepping to say we'd happily do it when we can. Inductiveloadtalk/contribs 18:40, 24 February 2021 (UTC)
"Undocumented texts like this are worse than useless." no, deletionists are worse than useless: some of us are here to write an encyclopedia, and some of are here to delete one. pontificating tl;dr on "partner" projects about quality issues, is a profound culture problem that is a cancer eating away at the community. and in this case the scan is ready to be migrated. after all, we transcribed the 12000 EB1911 article references, that were cut and paste in wikipedia with only an endnote. i leave it to others to interact with the adversarial, and their issues. Slowking4Rama's revenge 15:53, 25 February 2021 (UTC)
  • Don't forget to update the Wikidata entry The Barbarism of Slavery (Q19079234) with the publication information, I connected it to the author. --RAN (talk) 00:18, 26 February 2021 (UTC)
    The Wikisource copy needs to be placed on a separate data item, because it is an 1863 published edition, and not an 1860 copy (the date on the main data item). When we have a sourced edition of a work, it should be placed on its own data item so that the publication information can be added to that data item. It would be cross-listed on the main data item using the property "has edition" on the work's data item, and "edition of" on the edition's data item. --EncycloPetey (talk) 00:34, 26 February 2021 (UTC)
  • I leave that to the more experienced editors working on publications, the speech itself could have its own entry at Wikidata, and an entry for each published edition. I usually create one entry, and let the more experienced editors break them into smaller pieces. Recently at Wikidata, people have been breaking churches into: the building, the congregation, and the cemetery. --RAN (talk) 02:40, 26 February 2021 (UTC)
I've restored the Wikisource link in Wikipedia and added the page name. The previous copy is stored in User:Ineuw/Sandbox10.— Ineuw (talk) 21:36, 22 March 2021 (UTC)

Which dynamic layouts are actually used?Edit

As part of the ongoing cleanup of the JS that provides Dynamic Layouts, I'd like to canvass opinions on which of these people actually use:

  • Layout 1: default (full width with narrow gutter for page numbers)
  • Layout 2: 36em column (similar to itWS and frWS)
  • Layout 3: wide right gutter, header on right (but seems this isn't working: the header is off the screen for me). A little similar to deWS, but they have a much more data-heavy, vertical, header format (e.g. Ein Friedensstörer)
    • Only 5 pages set "Layout 3" as default
  • Layout 4: identical to Layout 2, but with a width 540px (this is not ideal, as it encodes a fixed concept of pixels:font size,. At the common default of ~16px=1em, it's about 35em, so functionally the same as Layout 2). Notably, this will look very constrained if a visually-impaired user has set a higher default font size.
  • Proposed Layout: Appears similar to Layout 3, but with a full-width, working header. Confusing name, since it's been "proposed" for nearly a decade. No-one appears to set this as default.

My personal feeling is that we should:

  • Replace Layout 3's CSS with the Proposed Layouts, then scrap the Proposed Layout.
  • Scrap Layout 4 as redundant to Layout 2

Inductiveloadtalk/contribs 10:42, 1 March 2021 (UTC)

Layout 2 was the one I've used most often if I have set one.
I also note various layouts set up directly using <div class="prose">..<div class="pagetext"> and others from earlier periods of Wikisources development, are there plans to deprecate those so they can be removed?
Additional layouts might eventually be needed for things like playscripts.. (see {{stagescript/s}} for example), The intent with that template was to eventually make it so the script formatting could be changed by user preference, like with dynamic layouts. ShakespeareFan00 (talk) 11:20, 1 March 2021 (UTC)
@ShakespeareFan00: <div class="prose">..<div class="pagetext"> are separate, but allowing dynamic layouts to work on non-scan-backed pages is also on my hit-list, but it will have to come later. At that point those classes would be fully obsolete.
Additional layouts are fine (by me, at least), but we can come to that later. For now, I just want to know if we can trim the existing layouts. Inductiveloadtalk/contribs 11:29, 1 March 2021 (UTC)
I passionately loathe all except No. 1 to the point that I don't work on texts with any of the others forced on me. I don't believe that we should be dictating to the end user a constrained width of what's displayed. By all means, have a couple available for readers to use as they choose. Every work that we make available should work and behave under Layout 1 (including playscripts). It doesn't help that I find the others to be ugly and stale in their design. Also, sidenotes are bad enough in Layout 1, they're dreadful in the others. Beeswaxcandle (talk) 17:37, 1 March 2021 (UTC)
@Beeswaxcandle: You are absolutely right that all works should render as well as possible in all layouts. Default layouts are one thing, but if a work is actively broken in any layout, that's a problem. To bang on my one of my many favourite drums, making sure that things work in both Layouts 1 and 2 is the bulk of the work of ensuring a work can export (because layout 2 is not far from the size of the content on a mobile/e-reader screen.
BTW, you can disable the ability of pages to use {{default layout}} to override with "Allow pages to override my dynamic layout preference on a case-by-case basis" in your gadget prefs. Making this a user toggle next to the layout selector is on my list. Inductiveloadtalk/contribs 08:19, 2 March 2021 (UTC)
I use Layout 2 frequently for poetic and dramatic works, where (a) a margin constraint is necessary to keep the text alignment, and (b) serifs are almost necessary to be able to distinguish I, l, and 1 throughout the text (in various combinations as words and abbreviations, e.g., III. vs. Ill. or Roman numeral 3 versus the abbreviation for Illiad). But otherwise, I do not apply layouts at all. --EncycloPetey (talk) 01:20, 2 March 2021 (UTC)
OK, so before this gets archived, any objections to the proposal above, viz.:
  • Replace Layout 3's CSS with the Proposed Layouts, then scrap the name Proposed Layout.
  • Scrap Layout 4 as redundant to Layout 2
This leaves us with the following three globally-enabled choices:
  • Layout 1/2 as they are
  • Layout 3 will change to get a full-width header, the body is the same as it is now
Inductiveloadtalk/contribs 10:47, 22 March 2021 (UTC)
It is a start. This might be moved to the proposal section. CYGNIS INSIGNIS 11:37, 22 March 2021 (UTC)

Why not acceptableEdit

Can someone other than Billighurst explain why this letter is not acceptable at Wikisource: letter concerning Louis Julius Freudenberg I (1894-1918) a letter held in a state archive.

See https://wwwnet-dos.state.nj.us/DOS_ArchivesDBPortal/WWICardDetails.aspx?CardID=809 --RAN (talk) 21:33, 1 March 2021 (UTC)

Well, I do not consider it suitable to exclude anybody from expressing an opinion on inclusion or exclusion of a work.
We already have Billighurst's opinion, he wrote "out of scope" when he made the move. I am looking for third party opinions. --RAN (talk) 23:25, 1 March 2021 (UTC)
As for the question itself: The letter was published (on the New Jersey State Archives website) and is verifiable as WS:What Wikisource includes demands. As for copyright licence, I would use {{PD-US-unpublished}}, but I am not an expert on US copyright law. The only reason why its inclusion might be doubted is its content. As it has no artistic value, only its documentary value can be considered, which is not very high, but there is imo some, considering that the person was not just a no-name soldier, but was an object of some local press articles at least. I tend to agree with its inclusion, but I am curious about other opinions too. --Jan Kameníček (talk) 22:28, 1 March 2021 (UTC)
"Artistic value" is subjective, we need objective reasons to include or not include, otherwise people's personal biases skew what we keep and what we discard. What is art, and what is not art is very personal, and museums and archives, are now confronting their past biases. Don't you think it would have been better for Billighurst to bring the move up for consensus before moving the letter from mainspace? Is Wikisource designed solely to contain art? Then why are we attempting to transcribe the entire New York Times up to 1925? You write that the letter's "[value] is not very high", again why aren't we taking the word of the archive that chose to preserve it, and display it at their website. We have too many subjective rules that are selectively enforced. That lets the rules be used to punish people that you don't like or don't get along with. I understand Billighurst has admin rights, but that shouldn't preclude him from gaining consensus first. He shouldn't be enforcing his personal preferences, or punishing people with audits for speaking out against him. --RAN (talk) 22:35, 1 March 2021 (UTC)

That a work is at an archive, does not in itself does not make it notable for reproduction at enWS. The criteria of WS:WWI is quite more specific than that.

Documentary sources are characterized by one of two criteria:

  • They are official documents of the body producing them, or
  • They are evidentiary in nature, and created in the course of events.

If we apply the presence of a document in an archive, then we can reproduce every probate document that has been registered, every land transaction, every police record, we can include every convict court case, and that becomes an unholy mess to manage and curate, and makes our scope extraordinarily large. WWI says we take published works, then expands into what are documents related to more issues of notability. We are not trying to be a family history website, we are not trying to be a local history website, we are not trying to be an archive office.

Verifiability relates to source, and only applies after scope has been confirmed. The copyright issue is another issue in itself, and not one that I was addressing. — billinghurst sDrewth 06:13, 2 March 2021 (UTC)

First of all: "They are official documents of the body producing them" and "They are evidentiary in nature, and created in the course of events." are both listed under Works created after 1925, I think you just randomly cut and pasted two phrases from the page. This work was produced in 1919. It is not a probate document, or a land transaction, or a police record, or a "convict court case", whatever that is. Again, why aren't you getting consensus, before moving documents, since this is not any of the forbidden documents that you just described? You again are using poorly worded, subjective interpretations, and selectively applying them, which feels like harassment. --RAN (talk) 06:26, 2 March 2021 (UTC)
  •   Comment I think this letter is certainly in scope in my book. It's "evidentiary in nature, and created in the course of events", where the event in question is World War I. IMO, this is an interesting document in its own right. Not earth-shattering, but a certainly a small window into the past. Since WS:WWI is clear on pre-1926 works with Most written work (or transcript of original audio or visual content) published (or created but never published) prior to 1926 may be included in Wikisource, so long as it is verifiable. Valid sources include uploaded scans and printed paper sources., that's rather moot. Eloise Lindauer died in 1935, so there's no issue with the unpublished copyright side of things.
  • As an aside, the work itself could really do with proper scan-backing to an index, and the wikilinks are a little OTT. But honestly, I don't mind the concept of wikilinks, and in fact I like the idea. Perhaps we can come to a different solution, for example making the default interwiki link colour in the text body closer to the surrounding text color? Or improve and promote Visibility.js, which allows the links to be unstyled entirely. Inductiveloadtalk/contribs 08:38, 2 March 2021 (UTC)

So by your reasoning any document created prior to 1926 is acceptable? How many pre-1926 probate testaments would you like to see onsite? Do you think that is our scope? How about a receipt book from 1899. A store ledger from 1852 from the Victorian goldfields? What about the brand plate of a late 19th century piano? The most boring letter ever written by my great great grandfather of his trip to the UK from Australia in the late 19thC. I have those and never would have considered them in scope for WS. All written prior to 1926. They are documents or similar, not published works. The current modification to the policy came about re a discussion about notability, and was worded to address that we were not wanting non-notable documents per [1] The intent of that change was not to give pre 1923 and post 1923 documents that sort of differentiation that you are mentioning as year is clearly about PUBLISHED works.

Wikisource, as the free library that anyone can improve, exists to archive the free artistic and intellectual works created throughout history, and to present these publications in a faithful wiki version so that anyone may contribute added value to the collection.

If we are changing our scope to now have family history type information, and anything written anywhere prior to 1926. — billinghurst sDrewth 12:06, 2 March 2021 (UTC)

Noting the 2009 change to WS:WWI [2]billinghurst sDrewth 12:11, 2 March 2021 (UTC)
WS:WWI is explicit, and it's the most policy-like policy we have. It's not RAN's fault if you don't agree with it. I might add that that what the letter of policies like "no-cross namespace redirects, even from Author: to Portal:" and "people with no extant works on site should be Portals" is also often used for unilateral action and complaints about people not adhering to the policies. Either what "policy" says goes, or it doesn't. It's not immutable, but, for now at least, it is there.
I'm not sure what that change in 2009 is relevant to. The salient part of the policy (emphasis mine) ("Any written work (or transcript of original audio or visual content) published (or created but never published) prior to 1923 may be included in Wikisource") has been there since 2007 (the referred discussion is Wikisource:Scriptorium/Archives/2007-09#Change_inclusion_policy), more than a year before your linked discussion. The change to the current "Most written" happened in in 2012 and the narrowing was apparently related to copyright, not notability. Furthermore, before that it said [Documentary sources] may range from constitutions and treaties to personal correspondence and diaries. We can argue whether or not this letter is truly under that definition, but there's certainly a case to be made that it is. I'd say, since WWI was kind of a big deal, it would be. It looks to me like this kind of thing has been explicitly allowed for over 15 years.
And, to be quite honest, all your examples sound fine to me, as long as they're scan-backed and formatted sensibly. I find it hard to get upset about too much material, only poor quality material. One's man's rubbish and all that. Ephemera is a whole, perfectly valid, historical field in itself. Inductiveloadtalk/contribs 13:04, 2 March 2021 (UTC)
  • Without digging too deep into every single argument presented in this discussion, I think I agree with Inductiveload's perspective on this. If an editor is putting up high-quality scan-backed and externally verifiable material, who am I to say that it doesn't count? There are literally hundreds of books with scans uploaded to WS that people arguably don't care about, so truly what is the difference between a published book and an archived letter for the purposes of establishing "notability"? As long as the allowance of ephemera doesn't make Wikisource un-navigable (which I don't think it would beyond its present state), I'm in support of the inclusion of any published work/archived work so long as there are editors to maintain it. -- Mathmitch7 (talk) 16:44, 3 March 2021 (UTC)
  • Furthermore, "Published" is a very crude proxy for "notable/interesting/useful". Considering the immense quantity of published newspapers, periodicals of various degrees of specialism and general governmental/official effluent, I'd say that if something is interesting enough to someone that they'll spend the time and effort to present it (and well!) at WS, it's already proved that it's of more interest than, say, a classified ad offering a sewing machine for sale in Caspar, Wyoming. Inductiveloadtalk/contribs 16:57, 3 March 2021 (UTC)
Oh, and noting Wikisource:For Wikipedians has a particular statement that there is a notability requirement for documentary sources, and this exists from page creation in 2011. — billinghurst sDrewth 06:11, 15 March 2021 (UTC)
  • I'm going to be singularly unhelpful and say that I agree entirely with Billinghurst in this individual case, but that I don't think it is reasonably possible to read that out of WS:WWI (e/c, but otherwise essentially what Inductiveload said above). It's written as an unholy hybrid of a user help page and a policy, and as such it fails at being either. Unless you were intimately involved in the discussions around the time it was written it is impossible to figure out how it applies to the situation at hand. For example, coming at it without context, a straightforward reading of that page would lead one to believe that anything created prior to 1926 is in scope so long as it is 1) verifiable (anything held in an archive is verifiable for this purpose) and 2) public domain. That is of course ludicrous as an inclusion criteria, but it is the plain reading of that page.
    On that basis I'm tempted to be unconstructive and say that until we actually fix our inclusion policy we really can't make any blanket statement and have to have a discussion for every single work individually. Very tempted. But as that would achieve little beyond annoying almost everyone I'm going to settle for my standard plea that we give some priority towards developing proper policies for the core issues, scope and inclusion—exclusion criteria foremost among them. I'm happy to help drive that, but only so long as the community actually wants a real policy framework for this. Based on every indication I have so far the community actively does not want more or better defined policy, and certainly has no appetite for the effort involved. I live in eternal hope that one day I will turn out to be wrong about that. --Xover (talk) 13:37, 2 March 2021 (UTC)
    While I agree that WS:WWI is far from perfect, it is the only guidance we have here at the moment and so until the community decides to change it we cannot blame anybody for following it. Of course that the community vote can overrule it in this (or any other) specific case, but it was very unfortunate that the letter was removed from the main namespace without a proper discussion at Wikisource:Proposed deletions, which is exactly what we have that page for. Had it been done so, we could be spared from a lot of bitterness now.
    So my suggestion is to return the work back into the main namespace for now, and delete it only if the community decides to overrule WS:WWI in this particular case.
    At the same time I absolutely agree with the need to refine WS:WWI, and although I do not think that the points raised by Billinghurst here apply to this specific case, I understand them generally and I will definitely take part in discussions trying to implement them into the rule in a sensible way. --Jan Kameníček (talk) 21:00, 2 March 2021 (UTC)
  • First, I think it problematic that wasn't taken to Proposed Deletion instead of summary deletion. Secondly, WS:WWI doesn't support the deletion here, and I don't think it was intended to. I was told, I don't recall by whom, that the then-1923 line was good enough, because the time alone would be enough to keep people from posting vanity junk. I can't say I'm always stunned by the value of what other people chose to work on, and I lack any evidence that there's enough problem with people posting pre-1926 works problematically to justify arguing about which vintage works are acceptable.--Prosfilaes (talk) 08:16, 3 March 2021 (UTC)
    @Prosfilaes: The created work at enWS was never and still is unlabelled as being from an archive. I will note that the actual images are so labelled, though it was not images that are immediately evident, and Commons inclusion criteria is different from ours. So the action that I took has to be seen in that perspective. — billinghurst sDrewth 10:43, 3 March 2021 (UTC)
I still do not see a rationale based on !Wikilaw for you finding the document ineligible for Wikisource. So far you have only used the slippery slope fallacy, that if this letter is eligible, it will allow "every probate document that has been registered, every land transaction, every police record, … [and] every convict court case." I still get the vibe it has more to do with personal animus toward me after I complained that you were using your admin rights to enforce your personal tastes. --RAN (talk) 15:00, 4 March 2021 (UTC)

Interaction ban proposalEdit

  • Can I also suggest an interaction ban with Billinghurst, someone else should be patrolling my entries, if they need patrolling. Someone that does not have the record of enforcing their personal preferences as !Wikilaw. As I reported above he was imposing his THIS IS A DRAFT!!! as !Wikilaw and instead of acknowledging it, his response was to remove the THIS IS A DRAFT!!! title, instead of having a proper consensus !vote. As I also pointed out there is a half-dozen ways that newspaper entries are being presented, and there seems to be no rush to harmonize them. When there is a rush to change my entries I get the vibe that there is some personal animus involved, hence the inquiry about an interaction ban. I feel like I am getting a punitive audit from Billinghurst for bringing up the previous THIS IS A DRAFT!!! interaction. --RAN (talk) 22:16, 1 March 2021 (UTC)
  • can we trout both of you?
  • "So by your reasoning any document created prior to 1926 is acceptable? " i love the parade of horribles; could you please not do enforcement with RAN, he is mostly harmless, if obstreperous. how are you going to decide which manuscripts to include? you understand the Smithsonian is transcribing Freedman's Bureau log books, and LOC is transcribing Women's Party meeting minutes.
  • but "!Wikilaw...THIS IS A DRAFT!!!" you are well aware this is how admins behave. you are lucky he did not make a filter just for you, as some admins do. if you could organize a task flow, with a newspaper (i.e. [3]) rather than clippings, or pitch in at POTM, you would have more sympathy. Slowking4Rama's revenge 23:40, 2 March 2021 (UTC)
  • @Slowking4: I think I mentioned earlier on this page, there are half a dozen ways newspapers are being organized, and an equal number of ways clipped articles are being named and formatted. I have seen the New York Times portal, but even that is a hybrid mixture of formats and clipped articles using various naming schemes. What do you consider the best formatted newspaper collection at Wikisource? --RAN (talk) 23:59, 2 March 2021 (UTC)
  • the fact that newspapers are a mess, with raw clippings from 2012, is not a good argument to spread the mess. i would like to see a migration from non-scan back clippings, to scan back works using chronicling america. https://chroniclingamerica.loc.gov/ a portal sweep up of clippings is a start, but the scans need to be incorporated. Slowking4Rama's revenge 12:32, 3 March 2021 (UTC)
  • filters are routinely used to block youtube, stopping work when government documents use them as references. but i guess the "admin may i" works for you. it is not a leap, to imagine RAN being summarily labeled as a "spammer". Slowking4Rama's revenge 12:10, 3 March 2021 (UTC)
    • That filter was actually turned off, because, recently, Youtube links are becoming more common in WS works than in spam (and we have more granular abuse filters now, partly because a fixed blacklist is a blunt tool, so we can target actual spam more effectively). This adversaralism is ugly. Just because you've received bans elsewhere doesn't mean all admins here are power-tripping maniacs. And no one is calling anyone a "spammer", and the suggestion that they might is, IMO, entirely unjustified in this context. Inductiveloadtalk/contribs 12:46, 3 March 2021 (UTC)
      FWIW technically Mediawiki:Spam-blacklist and Special:AbuseFilter are different things. So trying to add some clarity and difference, not stir the pot. — billinghurst sDrewth 10:49, 4 March 2021 (UTC)
      • filters and blacklist are opaque and summary processes. veteran editors are astonished at their appearance. you would need to provide a feedback page for each one, and some oversight and reporting. for instance, where and when did you decide to change youtube blacklist? you gave no indication this was possible in the past. this "adversarialism" started with abusive admins bearing false witness. RAN has been abused elsewhere; therefore, it is entirely possible he would be abused here as well. Slowking4Rama's revenge 05:09, 5 March 2021 (UTC)
  • if i might suggest a way forward: put on a maintenance category - "newspaper clippings not scan backed"; organize clippings by work portal, listed by date; find and upload scans of newspapers to commons; create indexes that include newspaper clipping, "newspaper clippings scan backed"; strongly encourage editors to follow this process. this would be better than exclusion criteria, by pivoting editors to doing better work. Slowking4Rama's revenge 18:27, 5 March 2021 (UTC)

Disambiguation downloadsEdit

Is there any way to deactivate the big "Download" button on disambiguation pages, or do we want those to be downloadable? Visually, the button implies that the disambiguation page is a work in its own right. --EncycloPetey (talk) 01:16, 2 March 2021 (UTC)

It should be easy to code to deactivate, they have __DISAMBIG__wikicode and are automatically categorised. I don't think that people would need to download, though at the same time it is out of the way of the body so not causing issues. — billinghurst sDrewth 06:15, 2 March 2021 (UTC)
This is phab:T273708 and the idea is that when done it will provide a magic word that we can add to templates like {{disambiguation}}. Inductiveloadtalk/contribs 20:14, 3 March 2021 (UTC)

Addendum: The same issue applies to Versions pages. --EncycloPetey (talk) 21:10, 15 March 2021 (UTC)

  This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Leave this with the phabricator ticket for further resolution. — billinghurst sDrewth 01:23, 10 April 2021 (UTC)

WikilinksEdit

When I asked previously about Wikilinks, I was told not to overlink, and no interpretive links. The example given was linking "Lewis Carroll best known book" to "Alice In Wonderland" because it is a subjective interpretation that can change over time. Why are my links to real objective people and real objective places being deleted? Again, this feels like tag-team harassment. Especially when I am probably going to be the only person on Earth using the entry for research over the next decade. --RAN (talk) 06:35, 2 March 2021 (UTC)

The links that I removed were all going off-site and not here on Wikisource. A Wikisource page should have minimal linking in the text, except to authors and other works. Links within text to Wikidata items are outside the intention of the Wikilinks policy (which was written before Wikidata existed). Links within text to Wikipedia are covered in the policy. "Myrtle Avenue" (to pick a random example) can reasonably be assumed by the intelligent reader to be a residential street based on the context. I intended my edit to be an exemplar of good practice, but I see you have just pasted your version over the top of my edit and removed the layout formatting I had done at the same time. Beeswaxcandle (talk) 07:55, 2 March 2021 (UTC)
Fiction and non-fiction should not be treated the same. No links may be good for fiction, but for non-fiction news articles, Coroner Rollins really is somebody, there is no speculation, it is not a roman à clef like Primary Colors where people endlessly speculate who corresponds to the fictional person portrayed. I can see where the London of Alice in Wonderland, may not be the real London, it may exist only as a fictional London. Non-fictional news-articles are different, and the reader loses, when the real historical Coroner Rollins is not identified and linked to his entry in Wikidata. I can search for all the entries for "Coroner Rollins" even if he is called "Aaron Burr Rollins" in another article or called "Sheriff Rollins" using insource. Fictional speculation should not be treated the same as identifying real people in news-items. The reader should get help and identify Moscow as "Moscow, Idaho" and not "Moscow, Russia" or "Moscow, Tennessee" or "Moscow, Kansas". People read fiction for entertainment, people read the news-articles because they are researching someone for a Wikipedia article or a Wikidata entry or a class paper, or a Ph.D. thesis. The links are part of being scholarly. No one is forced to click through to Wikidata, I have been reading from the Wiki Universe for about 20 years and I never felt compelled to click on a link I had no interest in, or was distracted by a word in blue instead of black. But I have come across places like "Myrtle Avenue" (to pick your random example) in news articles, and wondered if it was the same Myrtle Avenue from another news article I read earlier. --RAN (talk) 13:26, 2 March 2021 (UTC)
I have on my todo list trying to gain support for revisiting our linking policy. It is confusingly written and as written it excludes links to Wikipedia that were actually explicitly supported in the discussions leading up to it. I want to revise it to allow links to Wikipedia but to limit under what circumstances they can be used. It would seem sensible in such an effort to also address Wikidata and interlanguage links to other Wikisources. Possibly to explicitly outlaw them, but also possibly to explicitly allow them under certain constraints. For Wikidata, for example, we migh require the use of a template that has logic like "Link to Author: page if it exists, otherwise to Wikipedia article if one exists, otherwise add a tooltip with the Wikidata ID or a summary card of the Wikidata entity, and add the page to a tracking category" (randomly picked off the top of my head; community discussion would be needed to determine the details). My main goal would be to make the policy much clearer as to what is allowed, what is not allowed, and why. --Xover (talk) 13:47, 2 March 2021 (UTC)
I prefer linking to Wikidata because the links are more stable. In Wikipedia people are constantly being moved from name to name, and redirects are not always left behind to create a synonym. Especially where there might be a dozen people with the same name. "John Smith (politician)" might be moved to "John Smith (lawyer)" or "John Smith (New Jersey politician)" or "John Smith (mayor)" and a different person then occupies the previous "John Smith (politician)" or it may become a disambiguation page. Link rot is higher in Wikipedia. Usually the bare minimum information is just what I am looking for, not a full biography. Locations in Wikipedia are more stable. What do you think? Will the new rules allow a choice between Wikidata and Wikipedia by the person doing the transcribing, or will we insist on a Wikipedia link? --RAN (talk) 13:58, 2 March 2021 (UTC)
Without thinking in depth about it I suspect I would actually disagree with you on that, but that would in any case be up to the community to decide iff there was support for revising that policy. As it stands it certainly doesn't allow them. --Xover (talk) 14:24, 2 March 2021 (UTC)
Without wishing to get too bogged down in details, using the WD Q-id probably is the most reliable link target, and because we have JS and Lua access to Wikibase, it means that we can actually be much smarter about where we link to (i.e. fall back though Author/Portal, Wikipedia, then Wikidata, as well as show WS interwikis contextually—handy for foreign authors that we won't likely have here for a long old time). I would also say that, unless the user specifically asks for it (e.g. by clicking a Wikidata icon) delivering a user directly to Wikidata, as opposed to, say, Wikipedia, is a last resort, and that keeping them on Wikisource is preference when possible. Inductiveloadtalk/contribs 15:10, 2 March 2021 (UTC)
Also, we should bear in mind that Wikidata hardly existed when most of this policy (such that it is) was written, and the ability to use Lua to deal with that data in a more sane way is even more recent. Inductiveloadtalk/contribs 14:02, 2 March 2021 (UTC)
And I agree with the policy of not overlinking, no need to link to commons words, you usually only need to link to a person, or place name, the first time they are mentioned in a news-article. I agree with the policy of about being vigilant with speculative links within fictional works per my examples above. --RAN (talk) 14:22, 2 March 2021 (UTC)
  • @Xover: Can you quote me exact phrase at Wikisource:Wikilinks that bans links to Wikidata. I see a ban on "interpretive links" to things like "favorite book", I can see where two people would interpret "favorite book" differently, or that it may change over time. I see a ban on links external to the Wikimedia Universe. That makes sense because of link rot being detrimental to long term stability. I can see the possibility for controversy over adding links within fictional works. --RAN (talk) 04:18, 4 March 2021 (UTC)
  • @RAN: Sure. It's Links to Wikimedia-project pages are acceptable and considered to be annotations. [my emphasis]. The key here is that the linking policy says Wikimedia links are "acceptable" and in the same breath that they are "annotations". And annotations are, according to to the annotations policy, not allowed in works. In other words, the text that is phrased as if it is saying such links are ok is actually saying they're forbidden. This is incredibly confusing and nearly impossible to figure out without assistance (I had to have it explained to me after I had merrily added Wikipedia links to works for a good long while). --Xover (talk) 07:26, 4 March 2021 (UTC)
Then it is a good time to revisit policy pages, two policy pages I am constantly referred to, were marked "draft" and "essay". Eventually all lists of laws become self-contradictory when you have enough of them. --RAN (talk) 14:04, 4 March 2021 (UTC)
Also to make it even more of a mess, WS:ANN has never graduated to policy, it's still a proposal! Inductiveloadtalk/contribs 14:18, 4 March 2021 (UTC)
  • @Xover: The most recent ruling based on a consensus RFC was in 2013 and reads: "Wikilinks are annotations and are allowed in Wikisource. The creation of wikilinks is optional, where created they should be based on context, the type of work and the likely reader. There are a number of ways wikilinks could be miss-used (interpretative vs. non-interpretative) and a separate discussion will identify acceptable types of wikilinks." (2013) --RAN (talk) 14:42, 4 March 2021 (UTC)
This is why I hope people will support the interaction ban between myself and the person with admin rights who keeps enforcing their personal preferences as if it were !Wikilaw. There are plenty of other people to patrol new entries that can interact with me. --RAN (talk) 14:47, 4 March 2021 (UTC)
@RAN: As I wrote above, annotations are indeed permitted, but only in specially labelled separate copies of works and only when a complete unannotated version exists. The policy doesn't permit annotations (in the form of the links in question) in regular works. --Xover (talk) 18:14, 4 March 2021 (UTC)
You have said that multiple times, but the ruling makes no mention of the phrase "specially labelled separate copies of works" or anything that resembles that wording. --RAN (talk) 03:12, 7 March 2021 (UTC)
  • Show us an example of where this is done. Showing how it should be done, rather than describing how it should be done, will make it clear to everyone. We are arguing over contradictory wording on multiple RFCs and essays and draft policy pages. One good example will make it clear. I am still not sure if the "complete unannotated version" you are referring to is the scan of the original document, or we are supposed to cut and paste the text twice in every entry, one with links and one without links. Are formatting changes also annotations? Is choosing the text size for a headline changing the urgency of a news article? Is it a type of annotation? Look at War! and War! and War!, the same word but the urgency is different. If we don't match the original exactly, is that annotation? Is not formatting a headline and just using the default ASCII text a type of annotation by changing the urgency of a headline? I am looking through all the news articles and many entries have not been formatted, they are just unformatted ASCI cut and pasted text. --RAN (talk) 23:45, 5 March 2021 (UTC)
One more point. If we ban wikilinking in the body of entries, there is really no point having the entry here. The only point of bringing a text into the WikiUniverse is to be able to link to Wiktionary, Commons, and Wikidata. Almost every text already exists elsewhere on the web via Internet Archive or Google Books. --RAN (talk) 22:49, 13 March 2021 (UTC)

Wikisource:Annotations => to official documentEdit

We have had the page sitting there and chanting it as an official document for so long, so I think that it is just time we take off the template at its top. Time to just move on it, or fix it and move on it— billinghurst sDrewth 05:43, 7 March 2021 (UTC)

  •   Support --Jan Kameníček (talk) 13:30, 7 March 2021 (UTC)
  •   Oppose I apologize for this being so long: The document as it is written now is contradictory concerning wikilinks within the body of entries, people are citing it and coming to opposite conclusions. This has created edit wars over adding/removing wikilinks, wasting time. I prefer wording that complies with the RFC located here: "Wikilinks are annotations and are allowed in Wikisource. The creation of wikilinks is optional, where created they should be based on context, the type of work and the likely reader." We now have contradictory policy pages Wikisource:Annotations and Wikisource:Wikilinks with each claiming they ban or allow wikilinks in the body of an entry. The same wording needs to appear in each, so they do not contradict each other. The policy should also address "specially labelled separate copies of works" that User:Xover keeps referring to, but has not been able to give an example of. It would also help if the policy page contained clear specific examples of proper annotations and improper annotations. When we only have words, and no examples, people interpret the words differently. Read the Wikilinks discussion on this very page, above, where multiple people are coming to opposite conclusions based on the wording of the very policy page under discussion as it written now. We also have to address when two policy pages contradict, which has supremacy. We also need a better policy on flagging errors of fact and flagging spelling errors. They should be addressed, but in a way that the annotation is distinguishable from the original source material. Just adding [sic] without telling the reader what the correct word should be, leaves the reader in the dark. For instance, the New York Times addresses errors in their online articles, we should have a system here, where it is clear that the correction is not part of the original article, and that it has been added by a Wikisource editor, perhaps at the bottom of the page below the license. Clearly we do not need these for fiction, but for newspaper articles. If we do not address errors, people will be using the errors as references in Wikipedia and Wikidata. If an obituary states that a person was born in 1880 and we have the birth certificate at Commons and we have that person in the 1900 US census, and those documents use 1878, we should address that at the bottom of the page, in a way that the reader recognizes it is an annotation by a Wikisource editor. --RAN (talk) 19:16, 7 March 2021 (UTC)
  • I know that comparing texts in crude blocks is disallowed, but what about automatically comparing differences between editions? This is quite common in scholarship. Languageseeker (talk) 03:17, 9 March 2021 (UTC)
    Special:ComparePages? — billinghurst sDrewth 11:47, 9 March 2021 (UTC)
  • @Languageseeker: Personally, I'm not philosophically opposed to the idea, but I have yet to see any really good example of that kind of thing. The problems are a pincer of the Mediawiki platform we have being a fairly poor fit for that kind of thing on the one hand, and on the other the fact that creating such a work is actually a pretty major undertaking compared to the value proposition and people generally burn out and abandon their project after the first chapter or so. Also, since we (rightly, IMO) don't allow interpretive annotations in any case, such a comparative work is extremely dry. I imagine a better WMF fit for such a thing might be at Wikibooks or Wikiversity, where a comparator can add extra commentary about the differences. Inductiveloadtalk/contribs 12:18, 9 March 2021 (UTC)
  • I have created what I think can be a model of an annotation pointing out an error-of-fact in an obituary. See Jersey Journal/1914/R. V. Schuyler. If we were to let it go unrecognized, we allow it to be used as a reference for Wikipedia with the error intact. I leave the error intact, mark it with "[sic]" and add a note below the license template, so no one would think it was part of the original text. Most people at Wikisource are transcribing fiction, so this would not be a part of their transcriptions. --RAN (talk) 04:14, 12 March 2021 (UTC)

Funeral notice or a death noticeEdit

Would something as small as a 4 sentence funeral notice or a death notice be eligible to be stored here at Wikisource? I see where each entry for a dictionary has its own page, so size shouldn't be the reason for exclusion. A ruling at Wikimedia Commons agrees that they are ineligible for copyright even after 1964, since they consist of publicly available information and do not pass the threshold of originality, as opposed to an obituary which surpasses the threshold of originality. --RAN (talk) 06:51, 8 March 2021 (UTC)

Generally death notices and funeral notices are a compilation, and one would be an excerpt, so the way that I have been handling these at this time is transcribing onto the author talk page, and documenting with source. We can do data collection onto author talk pages and evidence curation, without the requirement to proofread, etc. I did do some differently years ago, and stopped as it wasn't a good practice. — billinghurst sDrewth 12:00, 8 March 2021 (UTC)
An isolated four-sentence dictionary entry without the rest of the dictionary entries and without the index page for the whole dictionary would also be considered an excerpt and would get deleted. So I also do not think that such isolated short notices are elligible for inclusion. --Jan Kameníček (talk) 00:27, 9 March 2021 (UTC)
  • Is that because you consider funeral notices excerpts, or is it because of their brevity? I would assume an excerpt would be a single page of a book, or a single paragraph from page, something that cannot tell a complete story from start to finish. For instance, a newspaper issue is the entire publication, yet, each individual article can be read from start to finish, telling a complete story. See The New York Times for individual articles from the New York Times. If it is because of brevity, what is the minimum number of words that a Wikisource entry requires? The people who are transcribing the New York Times are transcribing very short advertisements. See The New York Times/1900/12/01/Advertisement—Pennsylvania Railroad. Do these need to be deleted? A funeral notice is a paid advertisement. --RAN (talk) 01:00, 9 March 2021 (UTC)
    No, because it is not an isolated excerpt, they are transcribing the whole newspaper.See my example with the dictionary above. --Jan Kameníček (talk) 01:24, 9 March 2021 (UTC)
So, what would make a funeral notice not an excerpt? A cluster of 5 funeral notices on a page? Or the entire newspaper? Why is a single article from a newspaper, not an excerpt from that issue, yet an advertisement from the same issue is? The New York Times ad I showed is one of 8 excerpts from a 13 page issue of the The New York Times from 1900. The rules here can be can be mind-numbingly opaque based on fuzzily worded policy pages describing "excerpts" and "annotations". Good intentioned people keep coming to diametrically opposed conclusions when interpreting them. --RAN (talk) 01:36, 9 March 2021 (UTC)
@Richard Arthur Norton (1958- ): This is a cousin discussion to Wikisource:Scriptorium#Inclusion_criteria_for_articles above.
My personal feeling is that a "self-contained article" within a larger work like a newspaper, magazine or journal is acceptable, and the entire issue should not need to be transcribed just for one article of interest.
I would (again personally) say that generally, a single death notices is a section of an article, and the whole "Births, Deaths & Marriages" section of the paper is the "unit" which we'd proofread and transclude at. Creating an entire mainspace page for every 4-sentence entry is too much overhead, IMO. Even language dictionaries often chunk into sections like Chambers's Twentieth Century Dictionary 1908/A Adhere and Dictionary of the Swatow dialect/ba rather than a page-per-entry. {{anchor}} can be used to link directly to a single entry within a page if needed.
You are right that the guidance on such items is badly lacking, but I'm hoping that at the every least the aforementioned discussion can result in something useful. Inductiveloadtalk/contribs 08:45, 9 March 2021 (UTC)
A complete article from a newspaper is not considered an excerpt. There are still excerpts around, and I definitely created some in the early days, though as Inductiveload said they are problematic in nature, especially where someone starts to work on a whole newspaper. Transcribing and recording parts of compiled works is a laborious set of tasks to do properly.

Getting specific text to explain it and getting people to read it is always the challenge. Anyone who wants to build Help:Newspapers or Help:Transcribing newspapers with or without scans is always welcome here to transfer our snippets. Volunteers who can build good help pages has always been a shortfall here, we all prefer to work on the content. — billinghurst sDrewth 11:26, 9 March 2021 (UTC)

  • Perhaps we need to distinguish between what is absolute policy (!Wikilaw concerning copyright), and what is considered proposed best practices. !Wikilaw can be weaponized to harass editors. Since I have been organizing the newspaper articles, I can see a half dozen ways that articles are named, and an equal number of ways that newspaper articles are formatted, and just as many ways that newspaper articles are aggregated into portals. I am sure that the researcher only cares about the information contained within the articles, the rest is just esthetics. I think editors here are still experimenting with the most useful and practical way to display information, so that it maximizes usefulness to the end user. --RAN (talk) 13:34, 9 March 2021 (UTC)
@Richard Arthur Norton (1958- ): Not allowing "excepts" is covered by policy under WS:WWI#Excerpts. However (big however) this policy is enormously unsatisfactory when it comes to things like newspapers, periodicals and collective works. Some attempt is being made to normalise it in Wikisource:Scriptorium#Inclusion_criteria_for_articles.
The problem is, that, as policy, the phrasing ...are generally not acceptable is problematic as it is subjective and not clarified why it is "generally" true, but not "absolutely" true. Furthermore When an entire work is available as [scan], works are considered in process not excerpts. is unclear as to whether this permits transcluding work to mainspace in any state, as long as there's a scan, or just means unfinished work can sit in Page namespace indefinitely.
I think editors here are still experimenting with the most useful and practical way to display information, so that it maximizes usefulness to the end user: I think this is absolutely the case. Inductiveloadtalk/contribs 13:58, 9 March 2021 (UTC)
  • Yes, I have seen poorly worded, subjective rules, weaponized to harass contributors. As I pointed out before, if you let people with admin rights delete what they do not like, you end up with selection bias. At one point every portal I created was deleted because an admin person said they were not notable enough for a portal "only famous people get portals" was the reason for deletion. --RAN (talk) 22:45, 13 March 2021 (UTC)
    @Richard Arthur Norton (1958- ): I am sick of this bullshit. You keep making all these false accusations. Show me where I said that? Show me where I did that? Special:DeletedContributions/Richard Arthur Norton (1958- ) shows ONE deleted portal, and that is because there is an author page. You have not been harassed, your edits are patrolled just as anybody else's edits are patrolled. — billinghurst sDrewth 14:29, 14 March 2021 (UTC)
    Just agree to an interaction ban and let someone else patrol my entries. There are plenty of other people patrolling, that do not have the negative interactions we have had. --RAN (talk) 16:26, 14 March 2021 (UTC)
    You will not be getting an interaction ban with Billinghurst so you might as well stop beating that horse. Legitimate complaints about their admin actions can be presented constructively, but the over the top accusations and assumptions of bad faith have to stop. They are staring to tip into harassment territory in their own right regardless of what merit your underlying complaints may have. --Xover (talk) 16:34, 14 March 2021 (UTC)

Visibility now a gadget - long s and external linksEdit

The "Visibility" gadget that controls the appearance of {{ls}} in mainspace has been made into a proper gadget. It has a new function: external wikilinks can be toggled on and off as well. This is to try to take some of the heat out the Wikilinking discussion by allowing readers to choose to see the light blue external links (i.e. Wikipedia, Wiktionary, Wikidata, etc) or not.

The gadget is not enabled by default at present. Documentation is here: Help:Gadget-Visibility. Inductiveloadtalk/contribs 20:26, 8 March 2021 (UTC)

I do like the tool, but I would also like to point out that in fact it is useful only for users contributing to Wikimedia projects (who usually sign in and so can switch the gadget on) and is of no use to common readers. The tool could be useful to readers only if it were possible to turn it on no matter if they have their wiki account or not. --Jan Kameníček (talk) 00:18, 9 March 2021 (UTC)
Second, there should be a toggle on pages with long s to turn it on and off regardless of whether a user is logged in. However, this may be more challenging to implement in practice. Languageseeker (talk) 03:15, 9 March 2021 (UTC)
Making this available to all users, logged in or not, by default is as simple as adding default, but I didn't want to do that without any consultation. Inductiveloadtalk/contribs 06:53, 9 March 2021 (UTC)
  • In 15 years of reading in the Wikipedia Universe, I have never felt compelled to click on a blue link, unless I had an interest in learning more. I don't think an unregistered user would feel any more, or less, compelled than me. I think the people most interested in not seeing blue links are long term registered users. --RAN (talk) 13:26, 15 March 2021 (UTC)

Does anybody maintain the IA upload tool?Edit

task T276222 task T276648 I've been having many issues with the IA upload tool and I've opened phab tickets for them. However, they seem to be just sitting idly. How long does it usually take to fix the tool? Languageseeker (talk) 01:47, 9 March 2021 (UTC)

It depends, sometimes months, sometimes years :-( --Jan Kameníček (talk) 12:18, 9 March 2021 (UTC)
Just like anything else in phabricator. Sometimes you just have to befriend a developer the right way. It is why there is a wishlist survey every year or so. — billinghurst sDrewth 21:27, 9 March 2021 (UTC)
Thanks for the replies. A bit frustrating, but that’s that how work. Time to make friends with a dev. Languageseeker (talk) 06:12, 12 March 2021 (UTC)

UncategorizedPages on WikisourceEdit

I was looking to see if wikisource had any sources in the Javanese language. A search on the term "Javanese" brought up several hundred pages, however when I tried to find out what category(ies) these pages belonged to I discovered many were not categorized. I then checked Special:UncategorizedPages and I see that the number of UncategorizedPages is quite large.

Just wondering if this is by design, or just lack of enough volunteers to tackle the problem? Thanks in advance, Ottawahitech (talk) 11:42, 10 March 2021 (UTC)

Several factors. Primarily, the special page is not designed for how Wikisource operates. Page: and Index: ns pages (our workspace) typically do not need categorisation; they are not seen as display pages, and generally only need categorisation with maintenance categories where it is required. We would normally only categorise subpages where they are different from the category of the work, so many subpages show. There is no means to have the special category display in main namespace works only nor to filter out subpages. Then often it is wrong as there are categories and it just doesn't sense it properly. Hence we are where we are. — billinghurst sDrewth 21:55, 10 March 2021 (UTC)
Then we possibly have fallen into laziness, and some of the fiction categorisation is not necessarily helpful. Displaying a listing of works in Category:Short stories is of limited value. — billinghurst sDrewth 21:59, 10 March 2021 (UTC)
Note that it would be of substantial value if the OPDS generator could present exportable works organised by category, like Feedbooks does (https://catalog.feedbooks.com/catalog/public_domain.atom). Inductiveloadtalk/contribs 22:36, 10 March 2021 (UTC)
we also do not have a cadre hand creating and maintaining categorys. we could do some things, organize cleanup and search, but would need a quality circle to implement. Slowking4Farmbrough's revenge 15:58, 12 March 2021 (UTC)
Something that can be done right now is to watch New Texts as they are listed on the main page, then add suitable categorization for form, topic, etc. Quite often the only categorization these pages have when they are first listed are the automatic date and copyright categorization. --EncycloPetey (talk) 18:54, 13 March 2021 (UTC)

San Diego Union/1915/Cause of Taliaferro's Death Plunge Mystery to AirmenEdit

I removed the "unlinked" tag from San Diego Union/1915/Cause of Taliaferro's Death Plunge Mystery to Airmen because it is now automatically linked at San Diego Union, however because I used the autolinking process that comes with the periodical header, it will not appear in "what links here". Are we requiring hand-made bulleted links, so articles do not get slapped with the "unlinked" tag? Or is the tag asking me to create an entry for the article in Wikidata? --RAN (talk) 06:06, 12 March 2021 (UTC)

@Richard Arthur Norton (1958- ): I think linking to it from San Diego Union is sufficient, since it can now be reached. However San Diego Union is itself functionally unlinked (the only links come from its own child pages). Probably it should be linked from Portal:Newspapers and Portal:California#San_Diego, as well at categorised to Category:Newspapers of California. Inductiveloadtalk/contribs 11:59, 12 March 2021 (UTC)
Typically I just stick in previous = [[../../]]billinghurst sDrewth 00:40, 13 March 2021 (UTC)

Deprecation of the Special:Book toolEdit

I think now that the ebook exporter is working rather nicely, we should look at deprecation of the PediaPress Special:Book tool.

It has been functionally broken (the only think that works is the link to PediaPress for buying a book) for a very long time and, as far as I know, there is no effort being made to unbreak it. As it is, it just clutters the sidebar and de-emphasises the (working) WS-export links.

Furthermore, the Wikisource:Books is confusing, as all the entries are useless when the tool is broken. Inductiveloadtalk/contribs 12:08, 12 March 2021 (UTC)

Category:Pages using duplicate arguments in template callsEdit

Category:Pages using duplicate arguments in template calls has ballooned to over 700 pages, but looking at pages like Federal Reporter/First series/Volume 64 and Delaware Code/Title 4, I am not seeing any sort of duplicated parameters. This leads me to believe that false positives are being generated by templates on these pages, perhaps in the {{Header}} or {{Process header}} or {{Incomplete}} templates. BD2412 T 17:32, 12 March 2021 (UTC)

It will be something related to {{header}} or its modules, as that is where Inductiveload has been making the recent changes. — billinghurst sDrewth
I'm not sure that is true. AFAIK, it's not possible to call a template from Lua with duplicate arguments, because a Lua table can't have duplicate keys and tables are how you call frame:expandTemplate, so it must be an issue higher up than the modules.
Furthermore, very many items "in" this category are not actually in the category when you look at them (including your two example), so the category page itself appears to be/have been stale. The category itself seems to have shrunk again to ~75 entries, all of which, so far, have had valid issues (or transclude pages that do). I haven't changed anything, perhaps someone else has? Inductiveloadtalk/contribs 13:26, 13 March 2021 (UTC)
@Inductiveload: Module talk:Pagetype/testcases gives every appearance of managing that feat from Lua. I couldn't be arsed to figure out just how it does it, since the thing needs to be taken out back in any case. But if you want to dig… --Xover (talk) 18:32, 13 March 2021 (UTC)
@Xover: OK, so you can do it with frame:preprocess, (in this case {'|page=Page talk:Example|page=custom text', 'custom text'},) but none of the modules I've messed with do that, because they call expandTemplate, which take a table, rather than taking manually-constructed wikicode but I haven't always been so tidy with constructing HTML tags. Inductiveloadtalk/contribs 00:10, 14 March 2021 (UTC)
@BD2412: (CC Inductiveload since they were pinged) Most of this was just stale links tables, so touching the pages in the cat fixed it. It's possible the stale category association was caused by some of the header changes recently, but that's impossible to tell after the fact. In any case, the cat is down to ~75 entries now, and these appear to be legitimate instances of duplicate arguments. I've done some manual cleaning and these should be eminently fixable by hand. People may also want to watchlist this cat and fix any new entries as they are added. --Xover (talk) 13:27, 13 March 2021 (UTC)

Format helpEdit

Hi, can someone show me the correct format for inset text blocks like in the 3 un-validated pages of Index:Creole Sketches.djvu?

P.S. Why am I able to see whether edits are patrolled in Recent Changes? I'm just a User. It's a bit jarring to see the red exclamation mark next to all my edits. Ultimateria (talk) 07:02, 13 March 2021 (UTC)

@Ultimateria: I've set the poems on those three pages for you, but have not changed the page status. In re seeing the patrolled status on Recent Changes. This is a default setting for all signed-in users. I think you can turn it off in Preferences. Beeswaxcandle (talk) 08:07, 13 March 2021 (UTC)

"Gentlemen Prefer Blondes" status confusionEdit

Looking at the index, it appears Gentlemen Prefer Blondes has been completely validated. Yet when you look at where the work is transcluded on site, it says "not proofread", and there is no indicator on site that the text has been completed. I don't understand what happened. (SurprisedMewtwoFace (talk) 13:32, 13 March 2021 (UTC))

@SurprisedMewtwoFace: The "badge" at Wikidata hadn't been updated. I have updated it, and that adds it to Category:Validated texts.
And before anyone asks, yes, in theory it is possible to generate a list (and or auto-fix) this case (Index proofread/validated but the mainspace page isn't in the correct category), but it's not a totally trivial task. Not least because not all indexes are connected to mainspace pages via Wikisource index page (P1957), so it would involve traversing the links at Wikisource, or making sure the Index:mainspace mapping is captured at Wikidata as a first step. Inductiveloadtalk/contribs 13:52, 13 March 2021 (UTC)
@Inductiveload: Thanks so much! It's much clearer now. The only remaining issue is that it doesn't link directly to the title itself. It appears as "Gentlemen Prefer Blondes" (1926) and the link to the book still isn't active per se through Anita Loos' author page. I think this is because the edition was the 1926 one rather than the 1925 one. However, it should now be clear to everyone that the text is completely validated. (SurprisedMewtwoFace (talk) 14:01, 13 March 2021 (UTC))
@SurprisedMewtwoFace: It should be linked now from Author:Anita Loos. This should have been done right off the bat, but was evidently overlooked. I have also transcluded the copyright page to make it clearer why this is (1926) but the title page say 1925. Inductiveloadtalk/contribs 14:10, 13 March 2021 (UTC)
@Inductiveload: Thanks! (SurprisedMewtwoFace (talk) 14:46, 13 March 2021 (UTC))

Box-Car Children (1924) now completely proofreadEdit

@Kathleen.wright5: I have noticed that you have been doing some validating and other work on https://en.wikisource.org/wiki/Index:TheBoxcarChildren1924.djvu This book is now completely proofread and all pages have been input. If you (or anyone else) is interested, the book could use validation for the parts of it that have not yet been validated. Thanks for all your help! (SurprisedMewtwoFace (talk) 21:57, 14 March 2021 (UTC))

Nearing 500k: how do we advertise/celebrate this?Edit

Since we are presently at 498,752 texts in English, that means we are close to a big round anniversary-style number. Do we have any idea how we can use that success to promote Wikisource? A one-off logo? A blog post at Diff? —Justin (koavf)TCM 18:20, 15 March 2021 (UTC)

Both, I think the more publicity the better. Also, could we borrow the idea of banner showing how many pages were proofread this month from the French Wikisource? It always makes me feel so impressed. Languageseeker (talk) 18:49, 15 March 2021 (UTC)
Actually, we don't have that anywhere near that many texts because the count includes sub-pages, most of which are chapters of a text. I haven't been able to think of a way to accurately count how many texts we hold because it needs to be all the root pages that aren't disambiguation pages and some subpages where the work is an anthology that reprints works that were separately published. Then there's the problem of how to count journal and newspaper issues articles. Beeswaxcandle (talk) 06:08, 16 March 2021 (UTC)

Tech News: 2021-11Edit

23:22, 15 March 2021 (UTC)

Can we download from HathiTrust?Edit

I've previously downloaded books from HathiTrust, but they've changed their setup, so I don't know how to work around their system anymore. Is anyone able to download full books from HathiTrust anymore?--Prosfilaes (talk) 23:50, 15 March 2021 (UTC)

If you do an Inspect element, go to the network tab, and then find the image, you can download it image by image using a batch downloader. The image files are sequentially numbered. Languageseeker (talk) 02:16, 16 March 2021 (UTC)
You sure? I've downloaded them image by image before, but now I'm only getting blobs.--Prosfilaes (talk) 02:20, 16 March 2021 (UTC)
What's the book? Languageseeker (talk) 02:30, 16 March 2021 (UTC)
@Prosfilaes: The web view is indeed now using some kind of "blob" mechanism, so right click and copy image address doesn't work any more. However, the image still downloads as an image. The URL scheme is https://babel.hathitrust.org/cgi/imgsrv/image?id={ID_HERE};seq={N};size=10000;rotation=0, where you set size to something absurd like 10000 to make it give you the biggest size it has.
The Data API also still works (this is my preferred way as it's easier to work out how many images you need and an easier set of endpoints for things like page-number manifests. But you need to register an "UoM Friend" account to get your API keys (free and not very hard).
I have updated Help:Image_extraction#Hathi_Trust since it previously recommended right-click/copy-address for single images. Inductiveloadtalk/contribs 07:39, 16 March 2021 (UTC)
Not sure if I'm missing something as to why it hasn't been mentioned, although the HathiDownloadHelper tool still works for me? https://sourceforge.net/projects/hathidownloadhelper/ Nickw25 (talk) 10:40, 17 March 2021 (UTC)

Vertically aligned textEdit

I have been working on Page:EB1911 - Volume 28.djvu/409 which contains a table with text on its side. I thought about using Template:Vrl but it places the text sideways with the start of the text orientated towards the top. The EB1911 table has text orientated with the start towards the bottom. I have a work around in the table (diff) using style="writing-mode: sideways-lr;", but has anyone written a template to do this?

This is a vrl example
    This is a"sideways-lr"
example

-- PBS (talk) 15:01, 16 March 2021 (UTC)

I would just make it horizontal ... CYGNIS INSIGNIS 16:51, 16 March 2021 (UTC)
@PBS: So far as I know we don't have any existing template for this (I had a similar case recently and went looking). We should probably have one that allows arbitrary rotation in degrees (i.e. using CSS transform: rotate()), both for flexibility and because the CSS writing-direction is primarily designed for non-Latin scripts rather than visual formatting (which is what we usually deal with on enWS). --Xover (talk) 18:30, 16 March 2021 (UTC)
There is {{Rotate}}, but it has problems. I don't recommend reproducing most rotated text, rather reserve it for when the special effect is required. We're not aiming to identically reproduce all the nuances of a restricted-size printed page onto a webpage that needs to flow and cope with all sizes of displays. Beeswaxcandle (talk) 04:49, 17 March 2021 (UTC)
yeah, i would go horizontal also. all rotated text does is make the reader crane their necks. Slowking4Farmbrough's revenge 15:32, 22 March 2021 (UTC)

Upping the ante: All additions to Template:New texts to have been entered into WDEdit

I think that it is time that the community upped the ante on new texts. I think that we should expect that they all have a Wikidata item, and that we look to have a means to record the WD item into the template. At this time, I don't expect the addition to do anything, though it gives us the scope to work on it, and maybe we can have a means to just say {{new texts/Qnnnnnnn}} and let it just pull the data. I think that we need to be setting up WS to allow better queries and interactions with WD, and this seems a good place to start. Especially if it helps us to link additions up to our twitter feed.

Now to think about whether there is value or ability to migrate our validation dates into a field in WD. — billinghurst sDrewth 00:35, 18 March 2021 (UTC)

Template:Potus-eo should be aligned to Template:HeaderEdit

We have been actively updating and modernising our header template< and this spin off needs to be undergoing similar updates, either to be converted to be a subsidiary template of "Header" or it needs to have the components utilised in header added to it. It definitely needs to have Module:Plain sister aspects added to it so that its interwiki links can display automatically. It should have its download aspects compliant. — billinghurst sDrewth 01:34, 18 March 2021 (UTC)

Index page numbering for unnumbered pagesEdit

User:Billinghurst and I have a disagreement on his talk page about what to do with how the front matter pages at Bobbie, General Manager (a work that I proofread in full) should be labelled at Index:Bobbie, General Manager (1913).djvu. He calls the labels I give in my index pages, such as "copyr", "dedic", "title", etc., as "manufactured labels" and "butt-ugly" and other things like that. He wants every page that's not numbered in some way by the book itself to be labelled as "—" instead of having a label that tells what it actually is.

I absolutely disagree with that. I'm not sure how you can consider such things as looking atrocious as he does, since I consider them to look extremely nice on an Index page and in transcluded pages. But I suppose aesthetics and beauty in the end are matters of subjective opinion, which is why I decided to bring it up here. He also brought up that User:ShakespeareFan00 is a user who shares my opinion.

If he wants to continue to assert that Index page labels should not be a matter of personal preference, I propose that we instead come to a consensus on how policy (not guidelines pages or "advice") should deal with Index page labels. I'm not sure what policy page would be sufficient for asserting this as a rule though. So, would you guys vote Support to allow Index pages to label copyrights as "copyr", frontispiece pages as "frontis", covers as "cover", or title pages as "title", for example? Or would you vote Oppose, that we should disallow this, and assert that every page that's not numbered should be labelled as "—"? PseudoSkull (talk) 16:06, 18 March 2021 (UTC)

This matter is covered by the "general recommendations" in the yellow box on Help:Index pages#Parameters, which have been in place since 2013. Changes to these conventions would need to be the subject of a formal RFC as they have been in place for so long. Beeswaxcandle (talk) 16:44, 18 March 2021 (UTC)
As per Help:Index pages, pages with text on them should not be labeled "-". I'm not terribly stressed about how they should be labeled, but I find it very important that we can tell blank pages from pages with content on them at a glance of the index page.--Prosfilaes (talk) 01:40, 19 March 2021 (UTC)
I think that it's better to have standard label for such content than just -. They're not blank pages, but crucial pieces of information. The more specificity the better in my book. If we are to reproduce books exactly, then we need to include all the material. Copyright pages have a very specific and necessary purpose. Languageseeker (talk) 02:20, 19 March 2021 (UTC)


My approach recently , has been to consider pages before numbered pages (1,2,3) etc. to be numbered using lowercase roman (even when not actually done so in the book), based on the convention adopted by countless books that do numbers these pages directly. The exceptions to these are:
  • "Title" which is identified as such in an imported pagelist from an external source and image-plates, which often sit outside the numbered page range. (I also unless there is a half-title or other numbering to go on consider Title and i to be the same page in a work, the pages following the title to be ii , and so on.
  • "Adv" for advertisement material, which in many works is not numbered. (It can also be considered to be numbered with lower-case roman numerals in some works.)

ShakespeareFan00 (talk) 08:15, 19 March 2021 (UTC)

I am also looking through the pagelist's I contributed a while back as part of cleanup effort, with a view to updating them to any new conventions, that are put in policy. However, I would like a consensus on ONE set of guidelines so I am not changing things multiple times as different contributors express different views. ShakespeareFan00 (talk) 10:17, 19 March 2021 (UTC)
@ShakespeareFan00: I don't think you need to be concerned about your past indexes. I for one am grateful you did them at all, and I couldn't care less (at least for my uploads that you generously pagelisted) if the half-title is "iii" or "Half-title". I too will follow whatever rules are put in place, but for now at least, you seem to be well within accepted (and documented) practice. If a review of files is needed, it's not only your files!
Obligatory plug: I have also suggested an addition to the <pagelist/> tag to allow an auxiliary label for this kind of thing: phab:T274740. Comments welcome on it being a terrible idea and why, preferred syntax, etc, etc. Inductiveloadtalk/contribs 10:42, 19 March 2021 (UTC)
  • The timing isn't best for digging into this IMO (there are a few too many other unsettled matters), but I think it's an issue we should devote some attention to at some point. We're using these for multiple purposes, and some of those purposes are in direct conflict. In addition to a solid helping of personal preference, which of these aspects you find most important will determine what approach makes the most sense to you. Mapping from printed page number to index-into-file numbers may be the original purpose, but labelling pages for easy navigation by proofreaders and providing link targets on transclusion are not inherently invalid concerns. All of this can be solved through technical measures, but we need to figure out which aspects are worth solving and what that solution looks like. Nothing is free and some things may be prohibitively expensive. --Xover (talk) 12:31, 19 March 2021 (UTC)
If Billinghurst wants to start an RFC campaign he is more than welcome to. But since this has already been established practice since 2013 (and I didn't realize that when I made this Scriptorium topic), I refuse to start any RFC discussion myself, as I think making any change at all would be a waste of time. Besides just me, anyway, it looks pretty likely that community consensus won't be on his side on this one. PseudoSkull (talk) 13:21, 19 March 2021 (UTC)

Call for review, comment and discuss my PhD thesis on Wikimedia movementEdit

Hello,

Just a short message to call people interested to review, comment and discuss my PhD thesis on Wikimedia movement. All the best, Lionel Scheepmans (talk) 19:35, 19 March 2021 (UTC)

File:Hans Holbein the younger (Volume 1).pdfEdit

Would it possible for this to be replaced with a djvu scan to match the volume 2 uploaded? 88.97.96.89 09:29, 22 March 2021 (UTC)

here you go c:Help:Converting PDF to DjVu --Slowking4Farmbrough's revenge 15:35, 22 March 2021 (UTC)

Tech News: 2021-12Edit

16:53, 22 March 2021 (UTC)

Question about completed texts appearing under "New Texts"Edit

What factor allows for texts to appear under "New Texts" once they have been completely proofread? I noticed that https://en.wikisource.org/wiki/The_Box-Car_Children didn't appear under New Texts once we completed proofreading of it. Also, https://en.wikisource.org/wiki/Gentlemen_Prefer_Blondes_(1926) didn't appear yet, even though it's been completely validated. Just curious whether New Texts is automated or requires some kind of additional input. (SurprisedMewtwoFace (talk) 19:40, 22 March 2021 (UTC))

While the Inex of "The Box-Car Children" shows that all of the individual pages have been proofread, the book has not been assembled from those pages. If you look at the Main namesapce page of The Box-Car Children, all of the chapters links are red links. The pages have been proofread, but the book has not been assembled. --EncycloPetey (talk) 20:09, 22 March 2021 (UTC)
That explains "The Box-Car Children", but "Gentlemen Prefer Blondes" is clearly assembled, validated, and has working chapter links. (SurprisedMewtwoFace (talk) 20:14, 22 March 2021 (UTC))
There is no appointed person taking care of all new works to feature them under New Texts. Once somebody has proofread a work, transcluded it into the main namespace and provided it with an appropriate licence, they can add it to Template:New texts. If they do not add it there, nobody else is likely to do so. As Wikipedians like to say: Be bold :-) --Jan Kameníček (talk) 21:09, 22 March 2021 (UTC)
Okay, I just added Gentlemen Prefer Blondes to the New Texts template. Hope this helps! (SurprisedMewtwoFace (talk) 22:04, 22 March 2021 (UTC))
@SurprisedMewtwoFace: Great! Only it should be added to the top, but I have already corrected it. You can also use the "display" parameter to show the exact title of the book if it differs from the name of the page. --Jan Kameníček (talk) 22:34, 22 March 2021 (UTC)

What to name the old version?Edit

I moved the previous contents of The Barbarism of Slavery to HERE. I want to place it back in the main namespace but can't come up with a suitable title. The text is identical, but the old version focused on wiki linking the article to Wikipedia sources, and had no italics or any text modifications we use in proofreading. Suggestion are welcome. — Ineuw (talk) 21:21, 22 March 2021 (UTC)

The scan backed version is good and faithful to the original, I do not think we need the other one which is not scan backed and includes features not present in the original version. --Jan Kameníček (talk) 21:27, 22 March 2021 (UTC)
I don't want to destroy someone else's work.— Ineuw (talk) 21:40, 22 March 2021 (UTC)
Replacing a text by a scan backed version is not destroying, but improving. The old version comes from 2005/2006, and I am afraid this is the final fate of all contributions from that time. --Jan Kameníček (talk) 22:27, 22 March 2021 (UTC)
Thanks for the clarification. In that case, I will delete my sandbox copy.— Ineuw (talk) 23:03, 22 March 2021 (UTC)

Is it possible to add a Download button next to "transcription project" when the text has been Proofread or Validated?Edit

Would it be possible for the small scan link template to generate a Download button once the pages have been transcluded? Right now, it almost appears as if no transcription projects are finished. Perhaps, the button could also be colored to indicate the status of the text: Red if not all the pages have been proofread, Yellow if all the pages have been proofread, Green if all the Pages would be validated. Also, once entire text has been validated would it make sense to change the text from "Transcription Project" to "Transcription Completed"? In this way, users would be able to directly download from an Author's page. Languageseeker (talk) 12:52, 24 March 2021 (UTC)

{{small scan link}} should be removed once the work is proofread, so I'm not sure this exact proposal would add much value.
However, if we were better about setting the proofread (Q20748092) and validated (Q20748093) badges, we could theoretically get this information for the mainspace pages (to a max of 400 links per page, but less than that if the page loads other WD items).
Adding a download button on demand is pending phab:T275003 (though you can fake it with a link directly to the files like {{export}} does, you don't get the pretty new export dialog). Inductiveloadtalk/contribs 13:18, 24 March 2021 (UTC)
I've seem plenty of examples of small scan link buttons not removed. Instead of asking users to manually remove them which creates more work for users, wouldn't it make more sense to automatically transform them into a download button? Languageseeker (talk) 13:28, 24 March 2021 (UTC)
No, because a work can have had its index be fully proofread and even validated, but that does not mean that the text has been assembled in the Main namespace for download. The proofreading of the pages from the index and the preparation of a complete text for download are two separate processes. And Inductiveload is correct, the small scan link is meant to be temporary and should be removed once the work has been fully transcribed. --EncycloPetey (talk) 20:31, 24 March 2021 (UTC)
Good points, what about changing the small scan link to download link that would generate the same Download button as on the transcluded page. In that way, it would be possible to directly download a work from the Author's book instead of having to go to the a different page just to get a download link. Languageseeker (talk) 22:29, 24 March 2021 (UTC)
That would only work in some situations. Sometimes the listed work is a poem inside another larger work. Sometimes the link takes the reader to a versions page where more than one version of the work is available. Sometimes, a work is moved and the link on the Author page is a redirect. Each of those situations is common and can generate bad results. Having the Download link on the page for the work itself makes the most sense. --EncycloPetey (talk) 00:02, 25 March 2021 (UTC)

None ASCII characters in article titlesEdit

the Dictionary of National Biography uses date ranges in disambiguation extensions to article titles. There was pressure by some Wikipedia editors to change from dash to ndash to fit in with the Wikipedia policy on this issue. Wikipedia handles the issue by having ndash in the article title and a redirect with a dash.

The use of dashes was justified for Wikisource because it simplifies the URL and that no redirects were necessary.

I am currently working on Wikipedia linking article to Wikisource articles in the Royal Naval Biography I have just come across these two articles:

The are no ASCII redirects

So should the articles to be moved, or should there be redirects, or are the names fine as they are and there is no need for redirects? -- PBS (talk) 14:59, 25 March 2021 (UTC)

@PBS: They should be moved. For punctuation like that (and dashes, quote marks, etc.) we use plain ascii in page names (nb. in page names: article titles are a different story). --Xover (talk) 15:24, 25 March 2021 (UTC)
I am using the term "article titles" to be the part of the URL that is used to make the URL unique. I think you are using the term page name to mean the same thing. -- PBS (talk) 15:29, 25 March 2021 (UTC)
Oh, yes, sorry; I should have been clearer. I just meant it as an aside in case there was confusion, but assumed you knew that so didn't want to over-explain. I should have just left it out. But, yes, the page name, which is what ends up in the URL, uses ascii punctuation. I'm just hedging around "article" since mainspace wikipages on Wikisource can contain zero or more "articles" (with or without non-ascii titles)—from a newspaper, for example—unlike Wikipedia where a mainspace wikipage by definition is "an article". --Xover (talk) 15:56, 25 March 2021 (UTC)

<-- There could be a lot of these how do I go about requesting a bot to run through the titles? -- PBS (talk) 12:24, 27 March 2021 (UTC)

Make a request at WS:BOTR. Beeswaxcandle (talk) 17:32, 27 March 2021 (UTC)

Transcluding Index Comprised of ImagesEdit

How would I transclude an Index comprised of individual page images? I can't seem to get something like <pages index="Brazilian_short_stories" from="5" to="16"/> to work. Languageseeker (talk) 04:10, 27 March 2021 (UTC)

Normally, an Index is created from a single file with all the pages together. I have never seen this approach done successfully for any work with more than 2 or 3 pages, nor have I seen it attempted where the Index page name has no file type extension, nor when the contained pages have file extensions (and these extensions differ from page to page). Normally, an Index is created from a single file with all the pages together. "Due to the extra complexity and other drawbacks of this process, this is not recommended for anything other than very short works: such as single pages or works of just 2-3 pages in length". as per Help:Index_pages vide "Using individual image files". --EncycloPetey (talk) 05:12, 27 March 2021 (UTC)
yeah, i would go take the individual images, and create a multipage pdf, using the publishing program of your choice, and then upload to commons. my library has ms publisher installed. but the process and prework is not documented well. Slowking4Farmbrough's revenge 23:12, 28 March 2021 (UTC)
I figured it out you need to use {{page|Image Name|num=Page Name} or {{page|Image Name}} to transclude. Wikisource is quite slow when handling large PDF, Chunked Uploader can't handle file over 2gb, and importing from the IA often results in 503 errors. You can also use the syntax <pages index="Index" from="First Page Image" to="End Page Image" /> Languageseeker (talk) 23:37, 28 March 2021 (UTC)
While using {{page}} is a solution, it is deprecated in most uses. Beeswaxcandle (talk) 08:24, 29 March 2021 (UTC)

Royal Naval Biography Marshall funny missing pagesEdit

There are some funny pages in Royal Naval Biography Marshall most of them seem to be dealt with Ok for example:

This page appears in the index page:

However just before page 40 are two pages which do not appear in the index page:

and consequently the article Royal Naval Biography/Montagu, George includes all the text, but it has strange pagination, It goes:

  • 39, 40 , 41, 40 , 40* , 40** 41...

It is mostly correct—apart from the first instances of 40 and 41 which should be 39* and 39**. How can this be fixed so that the two missing pages appear in the index and the pagination is correct in the article? -- PBS (talk) 12:47, 27 March 2021 (UTC)

  Done Fixed by adjusting the pagelist. Beeswaxcandle (talk) 17:31, 27 March 2021 (UTC)

Is there a category for specific typography?Edit

I want to categorize articles containing manicules. See Category:Works with manicules, what larger category should that category be in? --RAN (talk) 03:02, 29 March 2021 (UTC)

This concept is not something that fits into our four-fold categorisation scheme (type of work; genre/subject; when published; licence), so before advising need to know how such a category would be used. [Maybe, even, why it would be useful here at Wikisource.] Beeswaxcandle (talk) 08:36, 29 March 2021 (UTC)

Tech News: 2021-13Edit

17:30, 29 March 2021 (UTC)

Has there been a Header function changeEdit

Am I remembering incorrectly, or did the {{Header}} used to add a navigation bar at the bottom of each page, so that readers would not have to scroll back to the top of the page in order to proceed to the next chapter / section of a text? --EncycloPetey (talk) 04:15, 30 March 2021 (UTC)

@EncycloPetey: Yes, it did. The footers are generated automatically by MediaWiki:Gadget-DisplayFooter.js (loaded by default as part of the Site Gadget) based on the data in {{header}}. It got broken during recent code cleanup here because it depends on the header having at least one next or previous link with the HTML IDs #headerprevious and #headernext. These IDs were lost in the cleanup so the footer script was just bailing out instead of doing anything. It's fixed now, but you may have to purge or null-edit a page before it becomes visible. --Xover (talk) 07:09, 30 March 2021 (UTC)

Adding portal categories for the Ottoman Empire?Edit

I noticed Corps de droit ottoman/Preface needed to be linked to some pages.

From Portal:Index#Class_K_-_Law I see various portals, and Portal:Law/Subclasses has a lengthy list. However I'm not sure what country code to assign to the Ottoman Empire.

Note the work was written by a British man and published in the UK, but it concerns the Ottoman Empire. Only the English-language preface is on the English Wikisource because most of the work is in French. This is public domain under US copyright, but because of its UK copyright the French language content can't yet be uploaded to the French Wikisource (as that project considers British copyright law) WhisperToMe (talk) 05:33, 30 March 2021 (UTC)

You can find classification codes listed in Library of Congress Classification. Because the system was initially developing prior to World War I, and this is an update to that system, some listings will be grouped oddly. "Turkey" is the correct place for Ottoman Law, and the code is KKX. --EncycloPetey (talk) 22:11, 31 March 2021 (UTC)

scan quality in pdfEdit

I notice the scan quality of the images in this pdf in a good deal worse than the images at the source. I initially thought it was an overly compressed djvu, where a 'c' character might be substituted for an 'e'; an example is the correction I made here which does show 'which' quite clearly at the source. CYGNIS INSIGNIS 05:03, 31 March 2021 (UTC)

They are indeed. But the original scan quality isn't great either. The scan is relatively low resolution, partially out of focus, and has been aggressively compressed. IA has then downsampled those images and recompressed them even more aggressively (4.4MB of scan images have been crunched down to a 672kB PDF; for reference a single scanned page alone should be somewhere between 600kB and 1MB). On top of that MediaWiki will extract the page using ghostscript and then reencode it into a 1024px wide "thumbnail" that is displayed in Proofread Page. At that point it's been recompressed three times, each time with generational loss. Frankly, it's amazing the results are legible at all.
If you need it for anything I can make you a DjVu that preserves as much quality as possible of the original scans, but as mentioned they aren't great to begin with. --Xover (talk) 05:49, 31 March 2021 (UTC)
It appeared on new texts, and I corrected a couple of scannos while reading it. The transcript is close to perfect so not sure it is worth the bother, although having a better djvu in situ may help with validating the work. CYGNIS INSIGNIS 09:15, 31 March 2021 (UTC)
FYI, this script might help, since, as well as adding handy links to pages, it adds an ability to load high-res images directly from the IA or Hathi into the Proofreadpage image pane: User:Inductiveload/jump to file. Inductiveloadtalk/contribs 09:47, 31 March 2021 (UTC)

Petition of John Wilkes to the House of LordsEdit

Do we have any experts in 18th Century cursive, who would be willing to check the short, single-page Petition of John Wilkes to the House of Lords, 1768, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:04, 31 March 2021 (UTC)

  Done --EncycloPetey (talk) 02:58, 1 April 2021 (UTC)

New toys in 1.36.0-37 - work-specific CSS and fixed DjVu paragraphsEdit

There are a few new features in ProofreadPage in today's deployment of MediaWiki 1.36.0-wmf37:

  • Index page styles (work-specific styles): you can now make a CSS page at Index:Foo.djvu/styles.css (or .pdf) and it will be applied to all Page namespace pages and any transclusion using the <pages/> tag. There are some more details at Help:Page styles.
  • DJVU paragraph control characters are now finally parsed properly, so you should see paragraph breaks in the text layers of new DJVUs. I don't think it will apply to existing DJVUs until someone runs an updater script on the server.

Inductiveloadtalk/contribs 21:55, 31 March 2021 (UTC)

Author descriptionEdit

Can someone show me the rule that demands that the Author description has to be just a few words like "American writer" and cannot be a few sentences telling where and when they were born and where and when they died? Is this a strict !Wikilaw or is someone imposing their personal preferences? If you look at all the authors listed in VIAF and LCCN and the list of authors at Project Gutenberg, wouldn't you want to know more information about them to properly disambiguate them? In some cases there are a dozen people with the same name, some are duplicates, but there is too little information to know. At Wikidata people with the same or similar names get conflated every day through bad merges, some of the entries have to be abandoned because there is no way to determine who-is-who anymore, then VIAF copies our bad data. See for example wikidata:Wikidata:VIAF/cluster/conflating_entities for people irreparably conflalted. I think the "American writer" description is what you would add if the name is already recognizable to anyone with a high school education, so there is no chance of conflation or improper disambiguation. Most people writing newspaper articles for local papers, would not fit this category of recognition. --RAN (talk) 02:15, 1 April 2021 (UTC)

Not everything at Wikisource is written down as "rules". Wikisource Author descriptions are not the place to provide an author's biography. Wikisource does not create encyclopedic content. That is what Wikipedia is for. Wikidata has an organizational project page where problematic VIAF IDs are listed and the issues noted. We do not need to duplicate that function here.
Here, we typically list nationality, field of writing/study/work, and additional author-relevant items such as a pen-name or a link to a spouse who is also a writer. If there are multiple people sharing the same name, then a Disambiguation page is created to disambiguate them. The description itself should be deftly written to allow distinction, but not verbose. For example, Author:George Van Santvoord (1891-1975) is "American literary scholar and professor at Yale", the minimum necessary to distinguish him from another author George Van Santvoord (1819–1863), who was an "American scholar of US government". The field of work and place of employment are enough to distinguish the two. For father and son, when they are both authors sharing the same name, we may state "son of [father]" or "father of [son]." Again a deft and simple description, not a biography. Biographic information can be stored at Wikidata, and if there is enough for an encyclopedic entry, that belongs on Wikipedia.
If you believe you have found sources for biographical details necessary for distinguishing two authors, and that information cannot be placed at Wikidata, that information can (and often has) been placed on the Author_talk page associated with that Author. --EncycloPetey (talk) 02:48, 1 April 2021 (UTC)
smdh. or how i learned to stop worrying and love the wikidata. Slowking4Farmbrough's revenge 01:19, 4 April 2021 (UTC)

Disambiguation pages for chapter numbersEdit

I recently created the page Chapter 4, as a disambiguation page listing every fourth chapter in every book in existence here. I think we should create disambiguation pages for all examples of these, because 3 people on all of planet Earth might not know how to read a table of contents, or go to a subpage manually, or go to Special:WhatLinksHere, or any other DEFINITELY less useful methods than that for getting to the fourth chapter of a book.

It should be pretty easy for people to scroll through the entire page to find what they're looking for, even if they can't use CTRL F. People with slow connections definitely won't have to wait for the page to load for very long at all. And best of all, if pages or subpages are moved in a book, it will be EASY AS CAKE to deal with, with all the chapter disambiguation pages there'll be.

[[:File:Max Headroom broadcast intrusion.webm|Here's some past discussion of this very idea,]] right here in the Scriptorium, 2008. PseudoSkull (talk) 15:47, 1 April 2021 (UTC)

The page is not as described and is actually a link farm to several webm files. The link to the past discussion above is also false (hence my adjustment). I assume therefore that this is an April Fool's joke and have accordingly deleted the page as "not notable". If you were serious about the concept, then open a discussion. Beeswaxcandle (talk) 17:49, 1 April 2021 (UTC)
You young whippersnapper   Inductiveloadtalk/contribs 21:59, 1 April 2021 (UTC)

Edit the text in a PDF?Edit

I've been trying to find a good way to edit the text in a PDF file prior to converting to DJVU and/or uploading to Commons, ideally with free software. For instance, to remove the words "Digitized by Google" from every page in the OCR text. Does anybody have any tips for this? -Pete (talk) 22:35, 1 April 2021 (UTC)

@Peteforsyth: Have you looked at Pdf Shaper free? — Ineuw (talk) 11:18, 2 April 2021 (UTC)
@Ineuw: Great suggestion, I had not encountered that one. I'm on my Linux computer right now, but I'll be sure to check this out when I'm back on a Windows box. -Pete (talk) 21:14, 2 April 2021 (UTC)
@Ineuw: Seems that the free version permits one to extract the text layer all as one text file, which is very useful, but not what I'm looking for. I don't see an ability to edit the text layer. I might give the free trial of the pro version a try, and see if it's in there. Thanks again for the suggestion. -Pete (talk) 06:39, 3 April 2021 (UTC)
@Peteforsyth: what exactly is the workflow here? Specifically, what is the input data (IA PDF, Google books PDF, raw images, ...)? If DjVu is the desired output, you can handle the text layer much more easily at that point, since DjVu has a set of tools for handling the text layer (djvutxt and djvused), and the text layer is a well-defined s-expression format.
Analogously to djvutxt, PDF text streams can be extracted in a similar format (including line and "block" data) with:
pdftotext -bbox-layout in.pdf out.xml
However, once you've figured out a heuristic for nixing the right words in the XML and made the edit, I do not know how to re-insert modified data like djvused can (though I'm sure with enough grobbling about in the PDF stream references you could manage it). Inductiveloadtalk/contribs 13:31, 3 April 2021 (UTC)

Britannica's articlesEdit

There are two identical articles: 1911 Encyclopædia Britannica/Pausanias (general) and 1911 Encyclopædia Britannica/Pausanias (commander). Fix it please. -- Sergey kudryavtsev (talk) 07:59, 2 April 2021 (UTC)

Fixed, the two pages have been unified at 1911 Encyclopædia Britannica/Pausanias (Spartan commander). --Jan Kameníček (talk) 08:50, 2 April 2021 (UTC)
Thank you. -- Sergey kudryavtsev (talk) 09:06, 3 April 2021 (UTC)

Google often unable to find works at WikisourceEdit

I have noticed many times that Google is not able to find works hosted in English Wikisource. E. g. today I tried searching for "zawis and kunigunde" wikisource and the result was that Google found a subpage of my userpage where I just mentioned the work, a talk page where I asked for something connected with the work, it found even a Wikidata item connected with the work, but did not find the work itself. Is there anything we could do to make works better discoverable? --Jan Kameníček (talk) 12:43, 3 April 2021 (UTC)

@Jan.Kamenicek: Google finds that as the first result for both "zawis and kunigunde" wikisource and, for that matter, just "zawis and kunigunde" (both on the mobile website, oddly) as the first result. Since this is a recent page, it might just be that the Google crawler takes time to discover the page and add it to the index. The mobile website hit might be just that the mobile website has been indexed, but the main website hasn't yet. Inductiveloadtalk/contribs 13:38, 3 April 2021 (UTC)
It's now returning the main website (not mobile) page, so I guess the spider got there, and some magical algorithm decided (correctly) that main website was a better result. Inductiveloadtalk/contribs 19:22, 3 April 2021 (UTC)
if we linked at wikipedia, and wikidata, then works might be more findable. Slowking4Farmbrough's revenge 01:14, 4 April 2021 (UTC)
FWIW DuckDuckGo and Bing both found the pages without issue. — billinghurst sDrewth 06:26, 6 April 2021 (UTC)

@Whatamidoing (WMF): Do you have anybody who has an "in" with Google who could explain why this is happening? Or someone who can assist us with what is reasonably missing in our metadata? — billinghurst sDrewth 06:23, 6 April 2021 (UTC)

I don't. The last time I heard of anyone working on SEO stuff, it was @Deskana (who reclaimed his volunteer status a couple of years ago, so it's been a long time). Let me ask around. I'll let you know if I learn anything. Whatamidoing (WMF) (talk) 19:11, 6 April 2021 (UTC)

@Whatamidoing (WMF), @Billinghurst: Google populates its search engine a variety of ways, most of which are Google secrets. The most likely explanation for why it took a little bit to show up in Google is that Google doesn't crawl Wikisource as much as it crawls sites like Wikipedia, because Wikisource doesn't get as much traffic as Wikipedia so it's not as critical if it's slightly out of date. The page was created on 31st March, so it's really not surprising that it took a few days for it to be picked up. The fact it picked up other pages first is also not surprising, as Google doesn't necessarily crawl websites linearly. It's now first in the results for me if I search for "zawis and kunigunde", which is excellent. All the metadata in the page HTML looks good and I don't really think there's much you could do to improve it. The most important thing is the link to the item Wikidata in the schema.org format in the HTML, which I can see is in there (search the page source HTML for "sameAs" and you'll see it). In fact, that good metadata is probably why Google switched over the link to the desktop version from the mobile version, as the "canonical" URL for the page is given as the desktop version. I don't think there's really anything to do to improve things, things are already pretty great, and it'll just take a few days to pick things up sometimes. --Deskana (talk) 10:05, 7 April 2021 (UTC)

@Deskana: thanks so much for that fulsome explanation, it helps, and this bit
The most important thing is the link to the item Wikidata in the schema.org format in the HTML, which I can see is in there (search the page source HTML for "sameAs" and you'll see it). In fact, that good metadata is probably why Google switched over the link to the desktop version from the mobile version, as the "canonical" URL for the page is given as the desktop version.
indicates that we need to rouse up our transcribers to do a better job of adding decent Wikidata. We have not been rigorous in getting all users to do it, though it is a tricky beast which Wikidata does not particularly assist. Some of the bot operators create shell items, which may not be particularly better. — billinghurst sDrewth 11:11, 7 April 2021 (UTC)

Newcomers in Recent changesEdit

One of the filters in Recent changes is "Newcomers" who are supposed to be "Registered editors who have fewer than 10 edits or 4 days of activity". However, it seems that this filter stops displaying edits of those who reached over 10 edits, no matter whether they reached also 4 days of activity. Is it possible to fix the filter locally? --Jan Kameníček (talk) 00:36, 4 April 2021 (UTC)

@Jan.Kamenicek: Huh? Not something that I am seeing in special:recentchanges where are you seeing that. Noting that the page can be variable depending on one's preferences. — billinghurst sDrewth 00:55, 4 April 2021 (UTC)
Oh new editors' contribs https://en.wikisource.org/w/index.php?title=Special:RecentChanges&userExpLevel=newcomer;learner&hidebots=1&hidecategorization=1&hideWikibase=1 Hmm, not certain how much control we have there mw:Release_notes/1.34#New user-facing features in 1.34, definitely not something I have explored. — billinghurst sDrewth 00:59, 4 April 2021 (UTC)
@Billinghurst: Yes, that’s it (only you have added Learners there too and so it shows more). For example there is a new user (Milivojevsasa) who started to be active in en.ws on 3 April 2021, but when I had this filter on I saw his edits only until their number reached 10, then the filter stopped displaying them. At the moment I cannot see his edits when I have this filter on, although he has not reached 4 days of activity in en.ws. The reason might be that he has longer activity in other Wikimedia projects, but I would expect that the filter of recent changes in Wikisource takes into account only Wikisource activity. --Jan Kameníček (talk) 01:10, 4 April 2021 (UTC)
@Billinghurst: Ah, you have probably clicked "new editor’s contribs" at the top which filters out both Newcomers and Learners. But I switched on only Newcomers after clicking "Filter changes". --Jan Kameníček (talk) 01:19, 4 April 2021 (UTC)
(ec) That is the tag itself, I just copy and pasted. Mediawikiwiki is very slim on detail. Phabricator:T149637 seems to be the only place with detail, and it says that NEWCOMER is an AND statement. — billinghurst sDrewth 01:22, 4 April 2021 (UTC)
Hmm, I can see it… If it is so, then it is very confusing, because the legend in RC states "OR", which would in fact also make much more sense... --Jan Kameníček (talk) 01:29, 4 April 2021 (UTC)
NEWCOMER is the combination of edits and time, not "either" statement. Those falling outside of only one of those parameters, fit into LEARNER—which currently aligns with AUTOCONFIRMED. Please provide a url for what you wished reviewed, as I hate guessing. — billinghurst sDrewth 01:32, 4 April 2021 (UTC)
Settings are not locally configurable, though the surroundings may be. — billinghurst sDrewth 01:45, 4 April 2021 (UTC)
@Jan.Kamenicek: It is something you are seeing in the filter names and descriptions? phab:T149385, and probably worth looking at mw:Help:New filters for edit review/Filteringbillinghurst sDrewth 01:49, 4 April 2021 (UTC)
And all now explained as you are using the javascript interface through your preferences, which I am not, so I am not see half the cruft that you are discussing. — billinghurst sDrewth 02:15, 4 April 2021 (UTC)
@Billinghurst: I had a look at the mw help page you have linked to, and in the section Filters lists there is also written: Newcomers–Registered editors who have fewer than 10 edits OR 4 days of activity :-) So after your explanation I will try to remember that they write OR while they mean AND :-) I also understand that it is not possible to change it locally so I will not bother about it anymore and I will use this filter in combination with the Learners filter instead. Thanks very much for all the explanation. --Jan Kameníček (talk) 10:13, 4 April 2021 (UTC)
@Jan.Kamenicek: This is set in Mediawiki like this:
/**
 * The following variables define 3 user experience levels:
 *
 *  - newcomer: has not yet reached the 'learner' level
 *
 *  - learner: has at least $wgLearnerEdits and has been
 *             a member for $wgLearnerMemberSince days
 *             but has not yet reached the 'experienced' level.
 *
 *  - experienced: has at least $wgExperiencedUserEdits edits and
 *                 has been a member for $wgExperiencedUserMemberSince days.
 */
$wgLearnerEdits = 10;
$wgLearnerMemberSince = 4; # days
$wgExperiencedUserEdits = 500;
$wgExperiencedUserMemberSince = 30; # days
and is implemented as follows:
		if ( $editCount < $wgLearnerEdits ||
		$registration > $learnerRegistration ) {
			return 'newcomer';
		}
so they should be a "newcomer" if they have either (less than 10 edits) OR (younger than 4 days) (or both). Which makes some sense, as you don't stop being a newcomer if you have 2 edits, 8 weeks apart. I think the confusion in wording is that the condition to not be a newcomer any more is, by application of Demorgan's Theorem, also an AND statement:
NOT ((less than 10 edits) OR (younger than 4 days)) = (10 edits or more) AND (older than 4 days)
The user account in question has existed but been dormant since 16 October 2014, so that's why they dropped off the newcomers' list after 10 edits - their account age condition has been satisfied long ago.
Regardless, we do technically have the ability to request that the English Wikisource values of wgLearnerEdits and wgLearnerMemberSince are adjusted if we (as a site) want (to do this, open a Phab ticket and add the Sites-Requests team). But it would apply globally to everyone's RC lists at enWS. There is no way to have a personal cut-off in the existing RC page that I know of, other than something hacky using the API or some kind of Toolforge tool with DB access. Inductiveloadtalk/contribs 10:48, 4 April 2021 (UTC)
Oh, thanks for the very detailed explanation! So the problem was that the activity is not counted from the first edit, but from creating the account no matter if the creation was intentional or just automatic. So I would suggest to change it so that it counted from the first edit if it is possible, but only if (as you have said) we want it as a site, it would not make sense to change for the whole ws only because of me. --Jan Kameníček (talk) 14:25, 4 April 2021 (UTC)
Counting from the first edit should possible, but it'd be a change to the Mediawiki core code: change $registration = $this->getRegistration(); to $registration = $this->getFirstEditTimestamp(); it's actually not that concise, because the getFirstEditTimestamp() method is deprecated. I guess you could raise a Phab ticket to suggest to calculate in that way.
Changing the values of wgLearnerEdits and wgLearnerMemberSince is just a config change. I kinda feel that 10 edits is on the low side for a learner, because you can blow though 10 edits in no time in the page NS at WS and then you'll drop off the "newcomer" radar (if you don't have a brand new account) and become a "learner" (which lumps you in with users right up to the 500 edit mark). Inductiveloadtalk/contribs 15:22, 4 April 2021 (UTC)
I founded the task T279258. As for raising the level of 10 edits, I agree, but not too high, let’s say 20–30. The aim of this filter is to filter out the very new accounts where there is quite a high probability to meet a single-use vandal account. Accounts surviving 4 days and a certain number of edits without being blocked can be usually considered serious users learning to contribute. --Jan Kameníček (talk) 17:50, 4 April 2021 (UTC)

Offer of a proofread textEdit

There is some offer to use a proofread text at Index talk:The Commentaries of the Emperor Marcus Antoninus.pdf, which was written there by the same contributor who founded the Index page. Would it be possible to employ a bot to add the proofread text to the individual pages of the book? --Jan Kameníček (talk) 17:57, 4 April 2021 (UTC)

At a glance, this doesn’t seem suitable for a merge-and-split because the pages are renumbered in a way that makes it challenging to reconstruct the original sequence. Languageseeker (talk) 18:56, 4 April 2021 (UTC)

Threshold for disabling Validation for a pageEdit

I was wondering if it makes sense to block validation for a page if the differences are too significant. For example, if there are more than 5 characters changed in the text, it would seem that it could benefit from another glance. Not as an indictment of any user, but more of a commitment to quality.Languageseeker (talk) 18:50, 4 April 2021 (UTC)

Not supported. If a template is amended or added, then it would flag under such a rule. There is no sensible way of distinguishing content from presentation in the text. Beeswaxcandle (talk) 19:11, 4 April 2021 (UTC)

Removing br from Index:The torrent and The night before.djvuEdit

For the current PoTM, many of the pages have a br appended to every line that needs to be removed. Is there any way to do that automatically? Languageseeker (talk) 18:54, 4 April 2021 (UTC)

Shouldn't be removed. They are a correct method of presenting poetry (in conjunction with block templates). Beeswaxcandle (talk) 19:14, 4 April 2021 (UTC)
On the proofread pages, it uses the poem tag without breaks. On the unproofread pages, it uses break. Isn't poem the correct tag here or are the proofread pages incorrect? Languageseeker (talk) 21:12, 4 April 2021 (UTC)
The poem tag is unpredictable and does not work well when applied across pages. As someone who regularly works on poetry and drama, I've switched to using br tags almost exclusively, because of the issues with using the poem tag across pages. --EncycloPetey (talk) 21:32, 4 April 2021 (UTC)
yeah - we might want to think about deprecating poem code, since we are going back to br. Slowking4Farmbrough's revenge 23:17, 6 April 2021 (UTC)
FYI: I (and @Xover:) am working on a new system: {{ppoem}}, intended to be a good replacement for poem. Specifically, it comes with hanging indents so it wraps better on small screens and uses spans for each line, as well as rendering out as a single div overall when transcluded. It's not quite ready for prime-time just yet, mostly because dropinitials are a major PITA to get right and the current approach slightly skews positioning on the page. And I do want it "right" before suggesting general use. It's not quite at the point where I'm canvassing general opinion on it (I will, but not just yet), but if anyone has smart CSS ideas, I'm all ears :-) Inductiveloadtalk/contribs 08:51, 7 April 2021 (UTC)

Wikisource Discord serverEdit

Reminder that there is an unofficial Discord server for the English Wikisource. If you have a Discord account and would like to join and chat with other editors there, please feel free to do so: https://discord.gg/cVv8hjbF (invite is permanent). The server's been around for a number of months now and a few of us have been talking there pretty regularly, but it'd always be nice to have more members. PseudoSkull (talk) 04:24, 5 April 2021 (UTC)

The Tragedy of Romeo and Juliet (Dowden)Edit

I tested a download of this work (PDF) and found that the Notes did not work as they should. The "Prologue" will be sufficient to test this issue. Instead of getting the two groups of notes in the download, I get error messages for the ref tag.

Is this issue the result of a failing of some kind in the work itself (that I could not find), or a problem in the conversion to PDF that cannot handle the notes? --EncycloPetey (talk) 15:40, 5 April 2021 (UTC)

@EncycloPetey: It's a known limitation. See phab:T274654. The Parsoid team are working on it, but the ETA is unpredictable. --Xover (talk) 18:13, 5 April 2021 (UTC)
Thanks for letting me know! --EncycloPetey (talk) 20:07, 5 April 2021 (UTC)

Tech News: 2021-14Edit

19:41, 5 April 2021 (UTC)

Public domain comics on DigitalComicMuseumEdit

I just stumbled across digitalcomicmuseum.com, which has quite a few comic books whose rights weren't renewed. It might be worth checking out, especially since portal:Comics is in pretty poor shape. Mcrsftdog (talk) 20:45, 5 April 2021 (UTC)

Note: It might be a good idea to do a bulk upload of these comics to Wikimedia Commons to make doubly sure they are preserved. PseudoSkull (talk) 15:45, 6 April 2021 (UTC)
I've seen it, but comics take too much work for Wikisource and we add too little to them to really make them worth the time for me.--Prosfilaes (talk) 00:58, 7 April 2021 (UTC)
I just spent a couple of hours there, surprised at how much good stuff was available, eg Brain Bats of Venus. I agree with the other comments, and there is little benefit in adding transcript, but wonder if merely presenting the images might be desirable; I assume that is reasonably straightforward if they are at Commons. If Jack Cole had an Author page here I would be using it. CYGNIS INSIGNIS 07:00, 8 April 2021 (UTC)
I don't think the effort is worth the result in practice, but I'll still note that there is definitely some value to transcribing any text content. Much like transcribing old movies and posters etc., the transcription makes it searchable and possible to access for people with vision impairment. But I think we'd need better tooling to make the cost—benefit add up. Our current tooling doesn't make the comics format easy to work with, and won't let us present the results in a useful way without massive manual effort.
I could be persuaded that just presenting the images was a useful stop-gap, but it'd take some thought and style guidelines + supporting templates to make it more than just an image dump; and even for that I would prefer to see an active WikiProject coordinating. I don't want to see haphazard addition of a couple of comics, done up inconsistently, "just because we can". The form is specialised enough that someone needs to really care about it to really do it justice. (oh, and we'd need to adjust our formal scope to permit it, let's not forget)
But if someone were to throw resources at an VisualEditor-type editor for comics pages, and a dedicated comics viewer in MediaWiki, I would certainly be all for it. With sufficiently fancy tooling, comics would be easier to do up than books and periodicals, and we could have a sizeable library of them in relatively short order. --Xover (talk) 09:54, 8 April 2021 (UTC)
As well as searchable, the text can be easily translated by online translators, making them at least somewhat accessible to non-English speakers.--Prosfilaes (talk) 03:21, 9 April 2021 (UTC)

Universal Code of Conduct – 2021 consultationsEdit

Universal Code of Conduct Phase 2Edit

The Universal Code of Conduct (UCoC) provides a universal baseline of acceptable behavior for the entire Wikimedia movement and all its projects. The project is currently in Phase 2, outlining clear enforcement pathways. You can read more about the whole project on its project page.

Drafting Committee: Call for applicationsEdit

The Wikimedia Foundation is recruiting volunteers to join a committee to draft how to make the code enforceable. Volunteers on the committee will commit between 2 and 6 hours per week from late April through July and again in October and November. It is important that the committee be diverse and inclusive, and have a range of experiences, including both experienced users and newcomers, and those who have received or responded to, as well as those who have been falsely accused of harassment.

To apply and learn more about the process, see Universal Code of Conduct/Drafting committee.

2021 community consultations: Notice and call for volunteers / translatorsEdit

From 5 April – 5 May 2021 there will be conversations on many Wikimedia projects about how to enforce the UCoC. We are looking for volunteers to translate key material, as well as to help host consultations on their own languages or projects using suggested key questions. If you are interested in volunteering for either of these roles, please contact us in whatever language you are most comfortable.

To learn more about this work and other conversations taking place, see Universal Code of Conduct/2021 consultations.

-- Xeno (WMF) (talk)

20:45, 5 April 2021 (UTC)

Phabricator ticket: Selective means to exclude (sub)pages from Special:UnconnectedPagesEdit

Special:UnconnectedPages for the Wikisources is polluted and often unuseful due to a proliferation of subpages of works that would not typically get WD items, eg. chapters of novels. I have created a phabricator ticket to ask for a means for us to tag works where the subpages and identified pages should be excluded from that special page. — billinghurst sDrewth 02:01, 6 April 2021 (UTC)

Maintenance task : Creating WD itemsEdit

We need to improve our rigour into our linking created items when works are being created. We aren't that brilliant.

We should also be setting up some maintenance tasks to get onto some of the huge backlog where we have subpage of collected works (poems, short stories, ...). — billinghurst sDrewth 03:13, 6 April 2021 (UTC)

Maintenance Proposal: Remove PG imports and flag Incomplete Projects w/o scans for deletionEdit

This site has many imports from Project Gutenberg from its earlier days that detract from the high quality of the scan backed works that it currently produced. Even worse, these tend to be of the most important works of the English language. I propose mass deleting them all. It’s time to replace them with scan backed versions.

In addition to Project Gutenberg imports, this site has many abandoned transcription project without scans from its early days. I propose flagging them all deletion and automatically removing them after three months if nobody has merged-or-split them to a scan, Languageseeker (talk) 12:36, 6 April 2021 (UTC)

@Languageseeker: I'm not sure I'm on board with deletion on any kind of short-in-WS-terms horizon (i.e. less than years), because with the resources we have, that's certain death for most of them, and they're not quite zero value IMO (at least not the complete ones). I would support a proposal to move {{Project Gutenberg}} from being hidden on Talk pages to being on main pages like {{incomplete}}. I'd also support the deletion of the PG version after a scan-backed alternative is transcluded and making a WS:D#Precedent entry for pro-forma nominations of them. Inductiveloadtalk/contribs 12:49, 6 April 2021 (UTC)
@Inductiveload: Do need a formal proposal for moving {{Project Gutenberg}} from being hidden on Talk pages? Also, we should probably check to make sure that the work has not been transcluded because on a quick glance, I found 3 that are flagged as PG but have been transcluded. Languageseeker (talk) 13:23, 6 April 2021 (UTC)
Addendum, there are also works such as Talk:Nightmare_Abbey that do no use the {{Project Gutenberg}}. Is there anyway to automatically add {{Project Gutenberg}}? Languageseeker (talk) 13:44, 6 April 2021 (UTC)
@Languageseeker: Category:Works possibly copied from Project Gutenberg for works that mention "Gutenberg" in the {{Textinfo}} source field. There will be lots of overlap with works that already use {{Project Gutenberg}} as well as works that are now transcluded. It'll take some time for the category to fill on the server side. Inductiveloadtalk/contribs 14:10, 6 April 2021 (UTC)
  • We don't even have a hard requirement that new works be scan-backed, so, no, deleting non-scan-backed works (including PG) is premature. Focus your efforts on proofreading high quality scan-backed versions of such works instead. --Xover (talk) 13:59, 6 April 2021 (UTC)
@Xover: Should we discuss making that a requirement? It would make sense from both a copyright and long term perspective. Languageseeker (talk) 19:14, 6 April 2021 (UTC)
  • show me the quality improvement process of first seeking out a scan, especially now that there are a million books at commons. deletion is not a quality improvement tool. Slowking4Farmbrough's revenge 23:13, 6 April 2021 (UTC)
The issue is that there is no way to verify the accuracy of these texts. Since Wikisource don't keep up with PG, the corrections of PG remain unimported. To me it seems that the PG text detract from the quality of scan-backed text. They also deincentivize the creation of scan backed replacements. Wikisource's strongest selling point is scanned backed transcriptions and PG texts undermine that. Languageseeker (talk) 02:29, 7 April 2021 (UTC)
the issue is we have a consensus to migrate non-scanned backed to scan-backed, and we have perpetual proposals to change that consensus without success. low quality work does not detract from high quality work. increasing the scrap rate does not improve the process. if you want to top-down dictate the process of transcription go over to German wikisource. that's how they act over there, and we are leaving them in the dust. we are increasing proofread pages at a higher rate. make a maintenance category of non-scanned back and we can work the backlog. delete the pages and we cannot work the backlog. Slowking4Farmbrough's revenge 00:40, 10 April 2021 (UTC)
@Slowking4: Yes, I heard the community and I accept that nobody is on board with a mass deletion. Would you be ok with a proposal that said no new PG imports without migration to original scan? This way the backlog can be cleared slowly over time? Languageseeker (talk) 01:10, 10 April 2021 (UTC)
We're here in part as a text archive for Wikipedia. Deleting all these works that Wikipedia links to hurts that. Gutting a large set of core works is going to offer theoretical gains at a huge practical cost, and could possibly even delay the creation of scan baked replacements by making us less visible and less enticing to new users.--Prosfilaes (talk) 20:11, 7 April 2021 (UTC)
Agreed. We should be seeking quality scans of good provenance so that we can convert such texts into scan-backed editions. There is a lot of work being done right now to accomplish this for the plays of Shakespeare. We have people doing the same for notable novels and works of poetry. The best way to reduce the number of PG texts and works not backed by a scan is to find a high-quality scan, set up the scan, add full supporting info for the scan at Commons and Wikidata, set up and start the Index page, and link the Index page from with the Author page and the current text. --EncycloPetey (talk) 01:31, 8 April 2021 (UTC)

Based on the feedback, it seems that the community opposes a mass deletion. What about this version

  1. Allow for the sdelete of any none scanned-backed text when a scanned-backed version exists.
  2. Allow for the sdelete of any none scanned-backed text that is incomplete and not potentially useful for a merge-and-split, e.g. a later reprint of an earlier edition.
  3. Disallowing the creation of new project that are not scanned backed.

Languageseeker (talk) 20:31, 7 April 2021 (UTC)

I don't understand your proposal. Why would we ever delete scan-backed works? --EncycloPetey (talk) 01:27, 8 April 2021 (UTC)
_@EncycloPetey: Whoops, some major Errata. I meant non-scanned backed. Languageseeker (talk) 01:29, 8 April 2021 (UTC)
I can agree with number 1 provided that the non-scan-backed text is not another important edition. For some works, there are multiple valuable editions, even multiple valuable first editions. Editions of Shakespeare plays, even the early ones, can have huge differences, so deleting one because another edition exists is insufficient. Likewise, many classical works exist in multiple translations, and even one person's translation can have more than one important edition. I cannot agree with number 3, since there are texts where a scan is not feasible, but the text could be created. We have had hand-made transcriptions of historical documents, or hard-to-find public domain documents that were published within another work that is under copyright, so the work cannot be scanned. --EncycloPetey (talk) 01:39, 8 April 2021 (UTC)

Rotated pages in PDFEdit

Is there a fix for this PDF? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:30, 6 April 2021 (UTC)

  • @Pigsonthewing: File:Boy Scouts and What They Do.djvu. Split with Scan Tailor and then OCRed and DjVuified with a disgusting script that I will one day clean up and post somewhere! Pardon the much larger file size, but the images have been rotated a bit and so they don't compress as much as they were because the moiré is off-axis. Remember that pictures should be extracted from the original PDF (e.g. with pdfimages, not by screenshotting!), not this file. Inductiveloadtalk/contribs 20:47, 6 April 2021 (UTC)

Problem: Poems attributed to Poems (Botta)Edit

We have the work Poems (Botta) which is an 1853 edition of the 1848 work by author:Anne Lynch Botta and we have the subpages special:prefixindex/Poems (Botta)/, however we have an issue that there are quite a few works claiming to be from the work that I cannot see in the work Special:WhatLinksHere/Poems (Botta) (flick to page 3) and look at search "from Poems (1848)".

I have done internal text searches and can definitely show only the one version of these works here and not in the transcluded form. Exact text and title searches sometimes can confirm works by Botta, sometimes we have the only copy of a text (searches Google, Bing, Duckduckgo). Poetry and its provenance is not my area of expertise, and hoping that some can confirm my searches and also hoping that some can provide guidance. — billinghurst sDrewth 23:50, 6 April 2021 (UTC)

@Billinghurst: These appear to have come from Index:Memoirs of Anne C. L. Botta - 1894.pdf (c.f. this bit of the TOC). Inductiveloadtalk/contribs 00:18, 7 April 2021 (UTC)
Good get, I knew that a couple were in there, and had checked the ToC, seems that some are embedded in the body with out ToC or index. <thumbsup> <sigh>. I will relocate them. — billinghurst sDrewth 08:30, 7 April 2021 (UTC)
  moved to subpages of the identified work. Someone will have fun at some point in time (or not). — billinghurst sDrewth 09:41, 7 April 2021 (UTC)

Tennyson author page &cEdit

I am looking for recommendations for a strong, comprehensive PD collection of Tennyson's works that we could bring over to WS. His Author page could use some cleaning up, in my opinion, to include an Index of Titles subpage perhaps. Currently, the majority of poems listed are unindexed (many verified against "The Complete Poetical Works of Tennyson" ed. Frederick Page (pub. Oxford University Press, 1953)). Over at IA, I am seeing The Works of Alfred Lord Tennyson in ? vols. in several editions. I notice we have the Index:The works of Alfred Lord Tennyson (1899, v 1).djvu—hardly worked on—but is that a preferred source? Thanks for any input, Londonjackbooks (talk) 08:47, 7 April 2021 (UTC)

Comment, is the Page edition out of copyright? Otherwise, that's a copyright violation. Languageseeker (talk) 20:49, 7 April 2021 (UTC)
I see ten volumes of (IA) Languageseeker (talk) 20:55, 7 April 2021 (UTC)

Help with Splitting PagesEdit

Could somebody help me split the pages in commons:Paradise Lost 1674.djvu Languageseeker (talk) 00:14, 8 April 2021 (UTC)

Help Cleaning up Author:Charles John Huffam DickensEdit

Poor Dickens has got a terrible author's page. Most of the editions there are from posthumous reprints that have little value and in general it's a bit of a mess. Could someone help me clean up this page. We'd probably want to create subpages for each one of his works with the periodical versions, published edition, cheap edition, 1858 Library edition, and the 1867 Complete Works of Charles Dickens. Anything after that is just reprints until the Clarendon editions that are in copyright. Charles_Dickens_bibliography is a help, but that could also use expanding. Languageseeker (talk) 05:26, 8 April 2021 (UTC)

We would not want to create Author: subpages for each work. Versions pages in Mainspace for those who search by name of work and a simple indented list on the Author: page for those who come in by searching for Dickens is sufficient. Beeswaxcandle (talk) 06:04, 8 April 2021 (UTC)
@Beeswaxcandle: Are you sure? Oliver Twist has 12 authors authoritative editions alone. Languageseeker (talk) 20:46, 8 April 2021 (UTC)
Yes, I'm sure. Author: subpages for a single work is not the intention of such pages. Beeswaxcandle (talk) 05:11, 9 April 2021 (UTC)
@Beeswaxcandle: So, is something like Wuthering Heights a mistake? Languageseeker (talk) 05:16, 9 April 2021 (UTC)
I think (?) this is just a simple misunderstanding based on the way each of you is using the jargon. If I'm understanding correctly, @Languageseeker: means that we should pages for each work, to list the various editions of the work. If so, that seems sensible and common, and Wuthering Heights looks like as good an example as any of what's common practice around here. But we wouldn't call that a subpage -- I think what @Beeswaxcandle: was understanding you to mean was something of the form of [[Author:Charles John Huffam Dickens/Hard Times]], which would certainly be an anomaly. -Pete (talk) 05:30, 9 April 2021 (UTC)
Yes, I think that you’re exactly right. I want make Hard Times the page that list all the versions of the work instead of containing the text of one specific edition. Languageseeker (talk) 05:53, 9 April 2021 (UTC)
Then, why did you call it a "subpage"? You mean a versions page. Once we are hosting multiple versions (editions) of Hard Times, then we'll need a versions page. Until then, the single version can stay where it is and the list of other versions/editions should be on the Author: page. Generally, we prefer to avoid redlinks on the three types of disambiguation pages (disambiguation, versions, translations). [Knowing as I say this, that there are redlinks on some versions pages, but these do not set precedent.] Beeswaxcandle (talk) 06:49, 9 April 2021 (UTC)
Sorry for the poor terminology. Probably the result of a sleep addled brain. I'm still a little concerned about posting the lists on the Author Page because it's already starting to look like a mess because of the large number of versions that Dickens contributed to and the need to distinguish between those and other versions. Languageseeker (talk) 14:11, 9 April 2021 (UTC)
Uh, since when is using the incorrect terminology a good topic for public discussion? We're all learning here. We got past the misunderstanding, maybe we can look forward, not litigate minor irritations.
It seems to me that in the case of Oliver Twist, the challenge is that a specific edition is occupying the title that would be used for a versions page. So a page move would be required, which given all the subpages might be something to approach with caution. That's what I see Languageseeker doing here; in another section they have asked if the title could be changed, but they've gotten no response. I think if we can just establish what the best practice is for that kind of move, the issue would be resolved. (And, I'm happy to help out with this in the coming weeks, you're right, the page is not very useful in its current state.) -Pete (talk) 16:59, 9 April 2021 (UTC)

New textsEdit

What are the criteria for adding works to {{New texts}}, and thus to the main page? I added Boy Scouts and What They Do yesterday, but another editor has removed it, without asking me first. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:29, 8 April 2021 (UTC)

discussion prior to temporarily removing text CYGNIS INSIGNIS 16:09, 8 April 2021 (UTC)
In which you mentoned removing the work from the template (let me check...) zero times, before doing so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:08, 8 April 2021 (UTC)