Wikisource:Scriptorium/Archives/2020-07

Feedback on movement names

Hello. Apologies if you are not reading this message in your native language. Please help translate to your language if necessary. Thank you!

There are a lot of conversations happening about the future of our movement names. We hope that you are part of these discussions and that your community is represented.

Since 16 June, the Foundation Brand Team has been running a survey in 7 languages about 3 naming options. There are also community members sharing concerns about renaming in a Community Open Letter.

Our goal in this call for feedback is to hear from across the community, so we encourage you to participate in the survey, the open letter, or both. The survey will go through 7 July in all timezones. Input from the survey and discussions will be analyzed and published on Meta-Wiki.

Thanks for thinking about the future of the movement, --The Brand Project team, 19:39, 2 July 2020 (UTC)

Note: The survey is conducted via a third-party service, which may subject it to additional terms. For more information on privacy and data-handling, see the survey privacy statement.


WMF is planning a rebranding affecting all projects

I strongly encourage everyone to go take this survey and to participate in the discussion at the linked venues. Note that the survey closes in 4 days (we got this notification three weeks after they started the survey)! The Board of Directors of the Wikimedia Foundation has already decided that there will be a rebranding, with a final decision on the brand set for August 2021 (that is, just over 12 months from now). At that point they may move forward with one of the options; pause or adjust the rebranding process; or abandon it altogether. The options they have presented and want the community to "help refine" are:

  Option #1 Option #2 Option #3
  Wikipedia as a network Wikipedia as a Movement Wiki + Wikipedia
Movement Wikipedia Network Wikipedia Movement Wiki
Movement tagline Part of the Wikipedia Network Part of the Wikipedia Movement A Wiki Project (for projects)
A Wiki Organization (for organizations)
User groups Wikipedia Group Penguins Wikipedia Group Penguins Wikigroup Penguins
Chapters Wikipedia Network Antarctica Wikipedia Organization Antarctica Wikipedia Foundation Antarctica
Foundation Wikipedia Network Trust Wikipedia Organization Wikipedia Foundation

That is, Wikisource will get one of the taglines: Part of the Wikipedia Network, Part of the Wikipedia Movement, or A Wiki Project. And the project will be hosted by either the Wikipedia Network Trust, the Wikipedia Organization, or the Wikipedia Foundation. The name of the projects appear unlikely to change as a result of this process, so we'll still be "Wikisource", and I don't see any push to change project logos either (but other branding elements definitely might).

The central point for the Board is to more strongly identify themselves with Wikipedia, because that's what potential donors recognise, which is why the options above are just three variants of "Wikipedia". --Xover (talk) 10:39, 3 July 2020 (UTC)

Re-purpose WikiProject OCR to WikiProject Scans

I propose to re-purpose the defunct (6 years since the last edit) WikiProject OCR to be "WikiProject Scans" with a focus on scan-backing works and acquiring scans for new works. OCR is not a problem we have often any more, since the OCR button(s) work well even when the scans don't have an existing text layer. Adding OCR to scans can remain a part of WikiProject Scans, but it will also acquire new competencies:

  • Acquisition and set-up of scans for existing works
  • Repair of existing scans (patching pages, placeholders, etc.)
  • Tracking of backlogs for scan-related maintenance

Based on the link saying "users can request scans" at Wikisource:WikiProject, OCR probably already has these aspects, but it's been forgotten. Inductiveloadtalk/contribs 14:33, 6 July 2020 (UTC)

20:18, 6 July 2020 (UTC)

FYI: Easy OCR

https://github.com/JaidedAI/EasyOCRJustin (koavf)TCM 23:54, 8 July 2020 (UTC)

Announcing a new wiki project! Welcome, Abstract Wikipedia

Sent by m:User:Elitre (WMF) 20:10, 9 July 2020 (UTC) - m:Special:MyLanguage/Abstract Wikipedia/July 2020 announcement

WikiProject Emory University Libraries

I just came across this interesting WikiProject, which, apparently, is not mentioned anywhere else. It is based upon WS:NLS, but it seems to have slipped under the radar more than that project. I encourage all Wikisource contributors to help out with proofreading the new works uploaded by this project. TE(æ)A,ea. (talk) 23:43, 10 July 2020 (UTC).

Backlog of works untagged with {index transcluded} is nearly done!

From a high a few weeks ago of 500, we're down to only 23 works in Category:Index Proofread or Category:Index Validated but that lack an {{index transcluded}} tag (excluding the 205 from WS:NLS or WS:EUL). Most of the remaining ones are weird/questionable enough that I'd love someone else to look into them, which is why I'm posting here. You can find the list on this PetScan. JesseW (talk) 20:03, 11 July 2020 (UTC)

And they are all done! Wheeee!! JesseW (talk) 17:14, 12 July 2020 (UTC)
33 NLS ones left right now: https://petscan.wmflabs.org/?psid=16764274 -- @Gweduni:? -- JesseW (talk) 17:22, 12 July 2020 (UTC)
And I've just gone through those as well.ShakespeareFan00 (talk) 19:21, 12 July 2020 (UTC)
Much appreciated! JesseW (talk) 19:46, 12 July 2020 (UTC)

16:29, 13 July 2020 (UTC)

Cleaning up Special:Contributions/Bluealbion

I have identified the likely sources of the user's contributions. Except 1912 Progressive Party Platform, others seem to be collected in a book. Would anyone please proofread?--Jusjih (talk) 01:49, 14 July 2020 (UTC)

Proofread what? Have you uploaded the book? If so, please link to it. JesseW (talk) 03:03, 14 July 2020 (UTC)
here you go [11] (but new york state appears different). the educators have done a web 1.0 on the ephemera, and we can source them to the pamphlets. Slowking4Rama's revenge 20:48, 14 July 2020 (UTC)

A brainstorm...checking OCRs against one another

I had an idea for a text processing script, and I'm curious if something like this has been tried before, exists, etc.

Let's use this example. Internet Archive has four separate scans of this text, All Over Oregon and Washington. For the purpose of expressing my idea clearly, let's assume that all four are the same edition of the book, but different physical books, and possibly run through different OCR engines. The goal would be to get an even more accurate OCR of the original text, by comparing all four.

A program could find which line corresponds to which, and then if two or three of them agree on a specific line, but others disagree, it would take the line withe the strongest agreement. Then it would output a single text file representing its best estimate of the most accurate OCR. This would eliminate some errors resulting from notes and scribbles on the page, dust in the scanning apparatus, and maybe some OCR engine errors as well.

Is this an idea that exists in any software already? And/or, for anybody with coding skills...can you assess how "doable" it would be? -Pete (talk) 21:42, 3 July 2020 (UTC)

Assuming the page matching is done (either based on image, OCR or manual), I imagine the process would be something like comparing each line of each page against each line of the other page using something like w:Levenshtein distance. Then, when you have a list of sets of likely line matches, you need to figure out which is "best". For that, you can use a majority-vote system, but you'd need at least 3 scans. Another heuristic I can think of might be to check if words that fail a spell check pass the spell check in an alternative scan. Then take the one that passes. In your work, for example, two different scans have:
topography, scenei^, soil, climate, productions, and improve- 
-^ topography, scenery, soil, climate, productions, and improve- 
So you can see you might be able to choose "scenery", here. On the other hand, the "scenery" scan has junk at the start that "scenei^" doesn't. And sometimes no scan might have a correct word:
The Columhia s log-book certainly does not betray 
The OdunibialB log-book certainly does not betray 
In this case, you might figure that "Columhia" is closer to a real word and choose that (correcting it is optional).
I think that while you certainly might be able to get some benefit out of this, the need for multiple scans that can be page-matched and line-matched (often one OCR breaks lines or interleaves columns, which will be awkward) makes it a bit unwieldy in the general case. I think you'd probably get comparable results from a post-processing script that exploits knowledge of common OCR mistakes combined with knowledge of what letters can and cannot appear in words. For example, "mhia" never occurs in English, so you can probably correct to "mbia". The advantage is that it's independent of having multiple scans. I have such a script that I've been working on: User:Inductiveload/cleanup.js, but it needs quite a bit more work. But you should be able to use it right now. There are configuration options (e.g. you can turn on a rudimentary long-s corrector, or disable corrections that might damage German words like "und" -> "and").
Another thing that might boost OCR accuracy is training Tesseract on "similar" works specifically. Tesseract 4 is trained on vast amounts of synthetic text in hundreds of fonts, it's not impossible that retraining on ground truth data from exactly the kind of work you want might improve things. Many works we have are very similar in terms of typesetting, so perhaps retraining on, say, the Google scan of this work might improve things for other Google scans, which generally don't OCR quite as well as the "full-colour" IA scans (though the IA uses Abbyy, not Tesseract). That said, I have tried to train Tesseract 4 in the past, and it's a real pain to generate suitable ground truth images/text pairs, and you really need lots and lots. And if you "overtrain" the network, you might have to have multiple models, one for "Google scans", one for "low quality newsprint", one for "1700s printing", etc. Inductiveloadtalk/contribs 14:49, 4 July 2020 (UTC)
Thanks for all the fascinating info here -- I don't have anything to add, my technical chops are definitely not up to making any useful contributions in this area. But it's nice to understand the context a little better, much appreciated. -Pete (talk) 20:55, 15 July 2020 (UTC)

Old use of the term "advertisement" as a type of preface

Many of the older books we include have a section of the frontmatter named as "Advertisement" -- which seem to be a type of preface, or introduction. Here are some examples. It is unsurprisingly very difficult to do a web search for this meaning of the term "advertisement" -- so I thought I'd ask here. We should really make sure it's documented clearly on wikt:advertisement and probably create a Category for such pages (at a minimum, making such a category would help distinguishing them from the pages that contain ads for other books, which are listed in Category:Advertisements). Does anyone know more about this older use of the term, and/or have any links discussing it? JesseW (talk) 03:12, 14 July 2020 (UTC)

@JesseW: I don't have any literature about it close to hand, but the "Advertisment" reflects publishing as it was arranged up to somewhere in the 18th century. For a large portion of works, the author paid the publisher/printer to have the work printed and sold, and in order to afford this a lot of works were sold by "subscription" (pre-order, sometimes before a single word is written). The "Advertisment", often then called a "Proposal", is the pitch used to persuade buyers to purchase such a subscription in advance, appearing either as a letter, a small pamphlet, a literal ad in a newspaper or magazine, or even in the author's (or sometimes the publisher's) previous publication. Several of Dr. Johnson's most well known works were published this way, for example. It is an ad, too, but its usage in the context is in the sense "public notice" more than commercial promotion. Once printed, the Proposal is often repurposed as the "Advertisment to the Reader" (not always, by any means), such as The Plays of William Shakspeare (1778)/Volume 1/Advertisement by Steevens: a sort of combination of an introduction and dust-jacket blurb designed to entice potential buyers.
The most famous example is probably Heminge and Condell's "To the great variety of readers" in the 1623 First Folio: From the most able, to him that can but spell: There you are numbered. We had rather you were weigh’d. Especially, when the fate of all Bookes depends upon your capacities: and not of your heads alone, but of your purses. Well! It is now publique, & you will stand for your priviledges wee know: to read, and censure [critique]. Do so, but buy it first. That doth best commend a Booke, the Stationer [publisher] saies. Then, how odde soever your braines be, or your wisedomes, make your licence the same, and spare not. Judge your sixe-pen'orth, your shillings worth, your five shillings worth at a time, or higher, so you rise to the just rates, and welcome. But, what euer you do, Buy. Censure will not driue a Trade, or make the lacke go. --Xover (talk) 08:50, 14 July 2020 (UTC)
Thank you so much!! That's very helpful. Now we just need to figure out what a good category name is for these... maybe Category:Subscription Advertisments (with a good explanation on the category page)? And the First Folio one is very entertaining! JesseW (talk) 23:33, 14 July 2020 (UTC)
@JesseW: Do we actually need a cat for them, though? --Xover (talk) 18:58, 15 July 2020 (UTC)
@Xover: I want one so I can partition the set of pages named "Advertisement" between that and Category:Advertisements (which I want to populate more so I can enjoy looking at them and wikilinking them and seeing which ones there are scans available of). And without a category, it's harder to tell if it's one that hasn't been considered, or considered and categorized as a subscription advert. Hopefully that's a good enough reason. :-) JesseW (talk) 02:22, 16 July 2020 (UTC)
@JesseW: I'm not opposed; I just don't quite see its utility (beyond your personal want for it). But the category system isn't my area so I'll leave it to others to say something intelligent about it. --Xover (talk) 15:41, 16 July 2020 (UTC)
OK, I'll leave this open for another week or two, then create the category above if no-one has a better idea. :-) JesseW (talk) 15:46, 16 July 2020 (UTC)

Missing caption

The image caption on Page:Aerial Flight - Volume 1 - Aerodynamics - Frederick Lanchester - 1906.djvu/38 does not appear, nor is it visible on Aerodynamics (Lanchester)/Chapter 1, where the page is transcluded. Can anyone say why? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:09, 16 July 2020 (UTC)

There is no where for it to display. Typically captions only show when using "thumb" or "frame" per mw:Help:Images. — billinghurst sDrewth 13:34, 16 July 2020 (UTC)

Styling of contractions

I'd like to hear from some other Wikisourcers on this:

Letters from an Oregon Ranch is a 1905 book which compiled newspaper columns. Throughout, the 1905 edition styles contractions with an extra space; for instance, the word don't is styled as do n't.

This seems to me to be a peculiarity of the printer's decision of how to present the text, rather than an attribute of the text. My preference is to "correct" these, i.e. to use don't in the above example. This seems consistent with Wikisource:Style_guide#Formatting.

One bit of "evidence": Since these columns were originally published by the Sunday Oregonian, we can see how that publication treated contractions. They did not use the extra space, as seen here: https://oregonnews.uoregon.edu/lccn/sn83045782/1903-05-31/ed-1/seq-40/#words=ELIZABETH+Elizabeth+ranch+Ranch

So, is this an issue that has been discussed or decided elsewhere? What have other Wikisourcers done in similar situations? Is there more specific guidance that I haven't found? -Pete (talk) 17:57, 15 July 2020 (UTC)

@Peteforsyth: I'm not aware of any discussions or guidance addressing this specifically; but I would personally say that in general this kind of thing is within bounds to correct. And in this specific instance I think this is just a kerning issue: there's not a space character there, it's just that the apostrophe is "monospaced" and taking up a full em-width. They then try to correct for it with manual kerning, and only succeeding somewhat, with rather inconsistent results. I would definitely correct it in this instance. --Xover (talk) 18:57, 15 July 2020 (UTC)
@Xover:: it looks to me like the typesetter has maybe been inserting thin-spaces where the space would have been before contraction: e.g. did-n't (did_not) and that-'s (that_is). It doesn't look as though it's just because the apostrophe sort of that font was a bit over-wide, or you'd see didn-'t, not did-n't.
Regardless, I agree it's something we can certainly reasonably class as a typographical artifact and ignore, just as we collapse the quasi-French spaced gaps before semicolons and on either side of em-dashes and don't explicitly reproduce ligatures. Not least because a thin space in HTML would allow a word to break over a line. Inductiveloadtalk/contribs 19:48, 15 July 2020 (UTC)
This is not an accident; this is a style. I've seen it in many older books. I don't think we should necessarily copy it, though.--Prosfilaes (talk) 20:41, 15 July 2020 (UTC)
No hanging offence whichever way you go, just be consistent. — billinghurst sDrewth 13:35, 16 July 2020 (UTC)

Thank you all, very helpful. FWIW, a bit more evidence that it was a printer's/publisher's decisions: this contemporaneous book from the same publisher uses the same convention (see, e.g., p. 36, and the word(s) "Have n't".) -Pete (talk) 01:08, 17 July 2020 (UTC)

Inline Wikidata calls

I'm seeing a lot of author pages being chanegd, to use:

{{pd/1923|{{#invoke:Author|date|type=death}}}}

surely it would be better to include the Wikidata call inside the template (but so that it is overridden, if a local value is provided)? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:05, 20 July 2020 (UTC)

Hirtle chart issues

I am not a regular user of wiki source so apologies if I'm missing something.

I'm looking into some copyright issues, specifically the date before which works (at least in the United States) are automatically PD.

This page Help:Public domain has helpful information but I see two problems. The first is that it is a copy of the Hirtle chart. that chart has the licensing information at the bottom:

Use of this chart is governed by the Creative Commons Attribution 3.0 License.

However footnote 1 to this page states:

copyrighted by Peter Hirtle and released under the Creative Commons Attribution-NonCommercial License 2.0.

I don't know the full history, but I'm guessing that the chart used to be more narrowly licensed. I'm happy to see that it is freely licensed now but should this footnote be changed?

The second issue is that the section on works published in the United States claims it covers all works. This is not correct. if you look at the chart you will see that sound recordings are in a separate section, and the public domain date is before 1923.

I note that Commons handles this in the heading Works except sound recordings and architecture. Technically correct but not ideal as it might be missed. However, this chart is incorrect as it doesn't note the special handling of sound recordings.--Sphilbrick (talk) 14:25, 20 July 2020 (UTC)

19:13, 20 July 2020 (UTC)

Brazzaville Manifesto

Can I import the Brazzaville Manifesto? Written in French, it is the 1940 declaration by then General Charles de Gaulle creating a branch of his resistance effort in Africa. It can be found in French government publications.[1] Eighty years is past the copyright limit, right, and also, if it's an official government publication,[2] I assume it's available anyway due to that, is that right? I would provide my own English translation in a separate file, unless I can locate a professional (uncopyrighted) translation. Mathglot (talk) 07:00, 22 July 2020 (UTC)

To be clear, government works of the current French Republic are not public domain in France (tho laws, rulings, statutes, etc. are all public domain in the United States). Seems like you could argue this is a "collective work" published more than 70 years ago and thereby in the public domain. —Justin (koavf)TCM 07:34, 22 July 2020 (UTC)
Hmm. I'd be willing to entertain the notion that for purposes of our copyright policy, France Libre and Régime de Vichy are both "France" in this period as competing successor states to La Troisième République, and subject to copyright laws they inherited from there (as subsequently amended by successor states). de Gaulle's personal copyrights would not have expired, but I think it is reasonable to consider this as having been authored by France Libre in the person of de Gaulle rather than him personally. As Justin points out, France doesn't have a general exception for government works, so this still leaves it in copyright in France. However, I would personally be willing to read this as a declaration with some legal force, even a "constitution" of sorts (again, only for the purposes of our copyright policy). Given those factors (and it is stretching interpretations significantly here), {{PD-EdictGov}} would apply and it would be fine for hosting here. But that's just my immediate take on it and there's no guarantee the community would concur if it were challenged. If you make all assumptions about uncertainties conservative and all interpretations strict, the conclusion flips the other way.
However, as a French text it is itself out of scope for English Wikisource. If you intend to translate it yourself you can upload a scan of the original; set up an index for it; translate page by page in the Page: namespace; and then transclude the translated text into the Translation: namespace (see On Discoveries and Inventions for an example of the setup). But please don't upload a PDF of a translation: if it was previously published it would still be in copyright (translations get independent copyright), and if it was a user translation it should be in wiki format rather than PDF.
My cursory scan of the Wikipedia article didn't find anything the suggested it was a "collective work", and I'm not clear on what legal definition of "collective" (vs. "multiple authors") would change the term from pma. to after publication (terms for collective works usually run from the death of the last of the authors to die). --Xover (talk) 09:06, 22 July 2020 (UTC)

References

  1. France libre (1940). Documents officiels. [Manifeste du 27 octobre 1940, à Brazzaville. Ordonnances n ° 1 et 2, du 27 octobre 1940, instituant un Conseil de défense de l'Empire. Déclaration organique complétant le manifeste du 27 octobre 1940, du 16 novembre 1940, à Brazzaville. Signé: De Gaulle.]. [Official documents. Manifesto of 27 October 1940, in Brazzaville. Orders No. 1 and 2, of 27 October 1940, establishing an Empire Defense Council. Organic Declaration supplementing the Manifesto of 27 October 1940, of 16 November 1940, in Brazzaville. Signed: De Gaulle.]. Brazzaville: Impr. officielle. OCLC 460992617. 
  2. The author and responsibility of the 1940 printing is listed as France libre (Free France). The question of whether this book is an "official French publication" is not a simple one, since what group represented official France was disputed in 1940. In continental France and Germany, Marshall Petain collaborating with the Nazis was considered the head of France, then called, the "French State". In the U.K., Africa, and elsewhere, it was General de Gaulle leading the Free French out of London. After the end of WWII and the establishment of the w:Fourth French Republic the w:Vichy regime and all its laws were declared void ab initio; as if France did not exist during those four years, or rather, was embodied in London with the Free French, and not on the soil of France. If one assumes the latter view, then this book is "an official French government publication".

Annoying error message when trying to move images to Commons

I often move files from Wikisource to Commons. (For me personally, this is more of a "tending to the backlog" task, as for the last few years I've tried to avoid getting files uploaded to Wikisource as I used to... but that's neither here nor there.)

However, in recent months there is an error message, which prevents me doing this work in many cases. The error message is frequently inaccurate; for instance, I attempted to add the {{NowCommons}} template to this file, and I got this message:

Error: Media utilising {{raw page scan}} should not be re-licensed for transfer to Wikimedia Commons.

Please read the instruction in the template for the process to improve the image (clean, resize, etc.) so it can be readied for insertion in the work. Once these steps are undertaken the image can be moved to Wikimedia Commons with the appropriate use and completion of the {{information}} template. The improved file can then be uploaded to Wikimedia Commons with an appropriate license tag allowable by that wiki.

This error message, however, is not helpful even when it does display at the intended time. Commons has a number of tools that make it better suited than Wikisource to adjusting descriptions, file names, etc., in many cases. And users may have their own reasons to prefer to do that work on Commons rather than Wikisource. Even if users neglect to do it at all (as I surely have in some cases), it can still be a benefit to move the file to a site that facilitates its use on other Wikimedia projects.

If the goal is to prevent work getting done, in my case, this template has been highly effective; I've mostly ceased moving these files. But I don't think that was the original intent.

I propose this error message, and the accompanying edit-blocking feature, should be removed. -Pete (talk) 18:13, 13 July 2020 (UTC)

@Peteforsyth: Don't use NowCommons, just add {{sdelete}} and state the reason. The whole concept of the imported images was to not have them moved directly with the existing name, and instead to have files renamed when uploaded to Commons. So the abuse filter is acting as expected. It was never the plan to have the files cleaned up and reuploaded to Wikisource, the plan was to load them to Commons directly.— billinghurst sDrewth 23:47, 13 July 2020 (UTC)
Are you thinking that using {{sdelete}} would have a different outcome than using {{NowCommons}}? It doesn't. Same insurmountable obstacle described above. I'd be fine with using whichever, but I've grown accustomed to using NowCommons because it's the one recommended in the interface, and because it spares me the need to have to type a custom reason for deletion. -Pete (talk) 21:49, 14 July 2020 (UTC)
@Peteforsyth: The whole guidance for raw images has always been to not to use the existing raw image filename, and to not upload corrected image here. Upload the image separately at Commons with a good filename, and nominate the file here for deletion. If that guidance is followed, then we don't have an issue. NowCommons will work perfectly find in normal situations, and the filter is written to deal with raw image. If you go outside the guidance then ... <shrug> — billinghurst sDrewth 23:16, 14 July 2020 (UTC)
template:raw page scan has the text. What is it that needs to be clarified of how to make it clearer to not upload the image here? — billinghurst sDrewth 23:22, 14 July 2020 (UTC)
Sorry, we seem to have failed to communicate. What, exactly, leads you to believe that I can add {{sdelete}} to the file, thus contributing to the work of tidying up orphaned files that have been removed to Commons? Because I can't.
As for this guidance (and I read "guidance" to mean something different from "policy," perhaps you see that differently), <shrug>. There's a process that used to work for me; I was able to do work that, I believe, clearly contributed to both Wikisource and Commons. Somewhere along the way it stopped working. If the people who made the decisions that made it stop working are uninterested in feedback or tweaking things, so be it. I'll let those smarter than me, or possessing higher privileges than me, handle the backlog, and I'll just work on stuff I can work on. (But for what it's worth, that seems like a pretty dumb way to handle things on a wiki, and I'll be surprised if you make more progress on backlogs than you would if you'd entertain feedback from lowly users like myself.) -Pete (talk) 23:33, 14 July 2020 (UTC)
But, just so the record is clear, there are two things in play here. The first is a no-brainer idiotic situation that should be addressed, even if you disagree with the second.
  1. The error message accuses the user of doing something the user did not do. The end user tried to take action to get a redundant page deleted; the error message scolds them "Error: Media utilising {{raw page scan}} should not be re-licensed for transfer to Wikimedia Commons." That scolding is utterly unrelated to what the user was attempting to do, and makes Wikisource look stupid.
  2. As for the workflow question, I think you're wrong and you'd definitely get more productive work out of me if you'd listen to feedback, but meh. Who cares, I have bigger battles to fight. -Pete (talk) 23:39, 14 July 2020 (UTC)

For what it's worth, as a bystander who has not done any of this before -- I'm still very confused about what the work is that you two are arguing over, and what either the old or the new process is to do it. If someone feels moved, linking to a brief summary would be welcome! JesseW (talk) 04:17, 15 July 2020 (UTC)

@JesseW: I'm sure I could have explained more clearly the first time, sorry. I'll try again:
  1. This file on Wikisource should be deleted, because it's now on Commons. However, there's no efficient way for me (as a non-admin) to mark it for deletion. I can't add the {{NowCommons}} template (which I think is the better choice, as it spares me having to type a verbose explanation, and readily identifies to an admin where the duplicate file is, so they can confirm before deleting if they like). I also can't add the {{sdelete}} template (which I'd be OK with doing, even though it's a bit of extra work). Either way, I'm confronted with the error message I quoted above, and the edit fails to go through. It sounds from what Billinghurst is saying that the cause is an Abuse Filter setting. IMO that setting should be changed, because it's preventing constructive editing; but I don't know much about how Abuse Filter works, so that's as far as my suggestion goes.
  2. In addition, Billinghurst and I have different ways we like to process files like this, which have been uploaded to Wikisource, but (we agree) should really live at Commons. Billinghurst wants them to be downloaded, cropped etc., and then re-uploaded, under a better name, to Commons. I have the same end goal as Billinghurst, but I like to follow a different process, because it takes far less effort, but leads to essentially the same result (except, I guess, with an extra redirect at Commons -- maybe that's what Billinghurst objects to, I'm not sure.) I prefer to:
    1. Use an automated tool to transfer the file from Wikisource to Commons, which spares me having to save it to my drive, find the file, open an image editor, and then re-upload
    2. Use an automated tool on Commons to crop the image
    3. Move the file to a better name on Commons
    4. (In some cases, the file would greatly benefit from further image processing, and that's something I or somebody else might or might not do, down the road. But for many files, this is not mission-critical, and there's no reason the users of these sites should have to wait months or years or decades for some color correction etc. before they have access to the file. Maybe this part is also something Billinghurst objects to? Again, I don't know.)
I used to follow this process with no difficulty, having pieced it together with the advice of a number of helpful Wikisource editors over the years. In fact, I originally took up this process because I had great difficulty converting the Internet Archive's JP2 files to PNG, but Hesperian's bot would readily take care of that and upload the file to Wikisource (not to Commons). More recently, I've learned to make those conversions, and to do so pretty efficiently, so I no longer cause files to be uploaded to Wikisource. But, there's a backlog of many files like this that I (and presumably others) had Hesperian's bot upload to Wikisource.
With what I know now, it's easier just to create the files from scratch from the IA and upload redundant copies to Commons, rather than jump through the hoops Billinghurst seems to think I should. Which is OK with me. That's why #1 is far more important to me than #2. If I could use my old process on the existing backlog to transfer files, I would probably do so; but shifting to his process is unappealing, as it would make the work pretty slow and onerous.
Hope that clears it up. Let me know if I've still missed the mark. -Pete (talk) 07:53, 15 July 2020 (UTC)
@Peteforsyth: You've missed the mark. :)
Well, not missed the mark exactly; but the situation is a little confusing. You're seeing the edit filter message blocking your edit when you try to add {{NowCommons}} or {{sdelete}}, in which context the error message seems a non sequitur. But that's not actually the edit that's triggering the filter: this one is.
The edit filter is trying to prevent altering the license template on HesperianBot's raw image uploads, which is why the error message refers to re-licensing. The filter was added after you changed the license template in 2015 (so that edit went through), but any edit you now make to that file will trigger it because the altered license template is in the page.
And regarding workflows and what's sensible, a factor that isn't apparent here, but which is very relevant, is that you have the filemover right on Commons and immediately rename these files after transfer there.
In other words, I think it's worthwhile to take a step back, reset, and try to have an open conversation about the sensible workflow. And keeping in mind that it is possible that we can't (sub)optimise for the best workflow for one specific contributor, if the same workflow will not work for all contributors (and that factor may be your filemover right on Commons, or my +sysop here, or any number of other such factors). --Xover (talk) 08:23, 15 July 2020 (UTC)
Don't see where I missed the mark, still sounds to me like the Abuse Filter is identifying an edit as "abuse" (a rather distasteful term) that is in fact intended to be helpful. I'm surprised that it's not glaringly obvious that such behavior is off-putting to contributors, and should be corrected. How the heck I should know that the script is reporting on something from 2015, but stating that it's about something that is taking place right now...is beyond me. If you think it's acceptable for an error message like that to be triggered on an edit like that -- i.e., that such an action contributes in a positive way to a site that is intended to welcome good faith edits from anyone -- I don't know what to say to you. I think you're the one who's missing the mark.
I think you're also missing the mark if you're trying to enforce one and only one "correct" workflow. But as I said before, I'm fine with it. You guys can defend your little fiefdom on file moves, I'll just upload fresh files and let you deal with the detritus left over on Wikisource however you want. It doesn't really bother me, it's probably easier for me to start from scratch anyway. It would have bugged me if this all happened back when I was more reliant on Hesperian's bot, but those days are gone. -Pete (talk) 08:31, 15 July 2020 (UTC)
@Peteforsyth: I haven't said word one about how I think this workflow should be, or whether the status quo is in fact desirable. I've tried to explain what is actually happening, in a technical sense, and suggested that it would be good if we could step back and discuss this openly. And the reason I made that suggestion is that your first message in this thread read as you being annoyed, but that the causes of your annoyance were not necessarily accurate (there is no edit filter deliberately trying to prevent your addition of {{NowCommons}}; it is a, presumably unintended, side effect of a filter designed for something else, and which is being triggered by the unique circumstances affecting this one file and a few others like it, but not all HesperianBot raw images). A discussion may end up with essentially the status quo if necessary; but it may also end up with removing unneeded annoyances, and might conceivably end up with a completely new workflow that is better for everyone. I have no qualms about saying we don't want to allow something that someone wants to do if that is needed, but my starting position is always to see if there is anything we can do to eliminate annoyances. And right now I do not know where we'll end up because there is information I'm missing about your current workflow; the parts and factors of it that are important to you; the goals of the edit filter that is currently interfering with it (I'm not familiar with it; I just tracked it down now while debugging your problem); and what our options are, in terms of both technical capabilities, policy, and need for guidance for the broad mass of users (which we do need, because not nearly every user is as conscientious and careful as you are, nor have the skills that you do).
PS. If my use of "missed the mark" irked you, or read as some kind of censure or accusation, then I apologise. I was just trying to be funny by echoing your own last phrase back at you when there was in fact a factor involved of which your were unaware (because you don't have access to the relevant log). I was very deliberately trying to not be confrontational, and if I failed in that I can only apologise and ask that you read with more generous eyes than what precision my fingers on this keyboard can apparently produce. --Xover (talk) 09:08, 15 July 2020 (UTC)
  Comment and I was just saying how and what the community determined the situation around the files that were uploaded here and the expectation of the process, and the wording. — billinghurst sDrewth 11:53, 15 July 2020 (UTC)
OK, I regret failing to heed your advice, and "posting while annoyed" rather than taking a step back. And @Xover: I do appreciate your efforts to dig into the confusing technical details of what happened and why. I think I've expressed all I need to, and I can just leave it at that. If it means abandoning these files and letting others deal with them so be it. I'll keep an eye on this thread in case there are further developments though. -Pete (talk) 16:21, 15 July 2020 (UTC)
@Peteforsyth: To "circumvent" this issue (without disabling or making into a warning only Special:AbuseFilter/40, which might be the better idea, but needs an AF editor to do), you just have to avoid saving a page with {{raw page scan}} and a template beginning with "PD". In your exact case, when adding {{NowCommons}} or {{sdelete}}, removing either one of the RPS or PD templates would work. In fact, I think just inserting a space after the open braces and before "raw" or "PD" would do it. You've run into a weird edge case which is presumably supposed to be impossible to get into, but due to the condition pre-dating the AF rule, is accidentally possible on some pages. I don't know exactly what the circumstances were that led to rule 40 being made, but it certainly seems to me that blocking your exact workflow was not what was in mind. I assume what was not wanted was people dumping enWS's raw page scans to Commons without adding proper metadata (not that this is what you are doing, let me be very clear), and it just wasn't noticed that some pages already had the "hallmark heuristic" of such pages: the presence of {{raw page scan}} and a license template. That you ran into the rule is basically a software bug: frustrating, but not a criticism of your workflow. Inductiveloadtalk/contribs 20:14, 15 July 2020 (UTC)
Ah, thank you @Inductiveload:. Now I'm wondering whether this might connect to an odd fragment of discussion on Commons from last November: commons:Special:PermaLink/415419038#Really? This always stuck out to me as a bit of a nonsequitur, and Billinghurst never replied. Possible that Abuse Filter Rule #40 was created at that time? -Pete (talk) 20:43, 15 July 2020 (UTC)

OK, that is indeed very confusing and strange! I just tested out this problem, on File:Portland, Oregon, its History and Builders volume 1.djvu-365.png (which now, because of my test, doesn't show the problem anymore). For the benefit of any other bystanders, here's my explanation of it. The filter rule prohibits edits that result in a page containing both {{raw page scan}} AND {{PD-1923}} (or other license templates). It doesn't matter what other change is made to the page; you can't save it until one of the two conflicting templates is removed. This has basically nothing to do with the process of moving files to or from Commons, or the different workflows that billinghurst and Peteforsyth were discussing -- it's just a quirk of an interaction between that filter rule and those particular pages (which we should really just fix via a bot changing the pages so they don't trigger the rule). Hope that helps! JesseW (talk) 15:45, 15 July 2020 (UTC)


Ready to move on. I appreciate the efforts of several editors to help me understand what was happening on a technical level. In the final analysis, it appears to me this issue is unlikely to affect many users, and for me personally it's not worth further argument. I also was a bit more heated in my comments than advisable, which I regret, and that makes further discussion more difficult.

For the sake of summary, here's how it seems to me: I am a big believer in the "Be Bold -- Revert -- Discuss methodology. To me, this situation clearly shows why that methodology is important. I made an edit ("Be Bold"); Billinghurst cleaned up after me (analogous to "Revert"); but he did not "Discuss". Instead of discussing, (I believe) he took an action that required elevated privileges (setting an Abuse Filter rule), and did not tell me about it. I can't prove this, and he hasn't confirmed or denied it; but that's what I believe.

I believe a great deal of effort on my part, in trying to figure out a confusing situation etc., and on the community's part in trying to parse the wall-of-text above, could have been avoided, with a couple sentences of basic communication ("Discuss"). "Hey, I noticed you left a mess. Were you planning on cleaning it up? Would you mind taking some steps to avoid that kind of mess in the future?" Something like that. For those who don't know, I tend to respond well to that approach.

Anyway -- I'm ready to move on, and my apologies for my role in making this a bigger deal than it needed to be. -Pete (talk) 22:16, 16 July 2020 (UTC)


Update: Xover disabled the Abuse Filter, and instead blacklisted the {{raw page scan}} template (etc.) from the File Importer. (This seems to have coincided with, but not caused, a related problem with File Importer.) -Pete (talk) 16:09, 23 July 2020 (UTC)

The FileImporter issue seems to be a long-standing problem (since 2018 at least) that's only been visible to users sporadically (it mostly just showed up as log messages for the developers/operations/release engineering), but that was exacerbated by changes in the new version of MediaWiki that was deployed this week (it hit enWS yesterday). The Phabricator ticked for this issue already has a patch that probably will fix this, but it is not yet clear when the fix will be deployed (it might be a matter of hours, but it might also take weeks; I'll try to update if I get anything definitive). Oh, and I had no hand in changing the abuse filter (the other changes I noted there were just coincidental). --Xover (talk) 16:28, 23 July 2020 (UTC)
Okay, sorry about that -- now I see who made the change. Reverting (silently) the change they (silently) made last year, a few hours after this file move. Pretty amused at this point at how far this individual, with some pretty elevated permissions on various of our sites, went without communicating with me. I rather doubt there were many other people using FileImporter on raw page scans at the time, but maybe some people just enjoy playing with their extra fancy tech toys more than they enjoy communicating with other users. -Pete (talk) 19:16, 23 July 2020 (UTC)
Patch has been merged and is currently targeted at 1.36.0-wmf.2, scheduled for rollout 2020-07-28 (will hit us next Wednesday), but may still get backported and deployed out of cycle. --Xover (talk) 21:06, 23 July 2020 (UTC)
The patch has been backported and deployed out of cycle, and testing suggests this problem should now be fixed. --Xover (talk) 17:24, 24 July 2020 (UTC)

Does the MassDelete gadget work as intended here? It's not working at mul.ws

Since lij: just graduated from mul:, we have thousands of pages to delete. It's pretty exhausting doing it manually and while MediaWiki:Gadget-massdelete.js is very helpful, the "DELETE ALL PAGES!" option is not working, leaving me having to click 6,000 times. Does delete all pages work here? If so, can someone help me decode why it's not working on mul.ws? Thanks. Additionally, I tried a pywiki solution outlined at mw:Manual:Pywikibot/PAWS and could not get it to work. If anyone wants to help me with that it would be appreciated but that seems much more involved.Justin (koavf)TCM 10:04, 24 July 2020 (UTC)

@Koavf: Our mass delete gadget works (I've used it today), but since I don't recognize the "delete all" terminology you refer to I suspect we are not using the same code. Compare the mul version with our MediaWiki:Gadget-massdelete.js to check. (I'm a bit busy just now, but I'll try to take a closer look when time allows). --Xover (talk) 11:40, 24 July 2020 (UTC)
@Xover: I imported it from en.ws, so it's identical. If you go to a category, you don't see "DELETE ALL PAGES!"? —Justin (koavf)TCM 12:21, 24 July 2020 (UTC)
@Koavf: That function is provided by mul:MediaWiki:Gadget-FastDelete.js which was added by Candalua in 2018. MediaWiki:Gadget-massdelete.js uses a dummy Special:MassDelete page to provide a text field where you can paste in page names for deletion. (I have a todo to modernise it and make it more convenient to use, but it's not a high priority) --Xover (talk) 13:06, 24 July 2020 (UTC)
Clearly, I was confused. But the gadget still doesn't actually delete all pages. Does it do so here? Is there any way to make this list of pages using the script? When I go to mul:Special:MassDelete, it allows me to put in pages to delete but no matter how few or many I enter, when I click on "Delete", nothing happens: the special page just reloads itself and the content that was to be deleted is still on the wiki. Oh, it just started working when I stopped using the deletion rationale; leave that blank, and it works.Justin (koavf)TCM 13:32, 24 July 2020 (UTC)
@Koavf: There's no obvious reason why the rationale should not work (worked fine for me here earlier today). Do you get anything in the javascript console? Which skin are you using? --Xover (talk) 13:48, 24 July 2020 (UTC)
Monobook and it's working like a charm now (well, actually better than a charm since this actually works). Thanks! —Justin (koavf)TCM 13:49, 24 July 2020 (UTC)

Please import

s:mul:Richard Nixon's Phone Call to the Moon and ping me once it's done so I can delete it there. Thanks. —Justin (koavf)TCM 02:50, 21 July 2020 (UTC)

@Koavf: Richard Nixon's Phone Call to the Moon   Done. Can I please leave you to clean it up and add to NIxon's author page. Thanks. — billinghurst sDrewth 05:03, 21 July 2020 (UTC)
@Billinghurst: It's already on there. Thanks. —Justin (koavf)TCM 05:05, 21 July 2020 (UTC)
@Billinghurst: also: mul:Slee v. Erhard, Deposition of Margaret T. Singer, Ph.D. (1987), mul:State of Arizona v. James Arthur Ray, Case No. V1300CR201080049, Defendant James Arthur Ray's Response to State's Motion in Limine re: Witness Rick Ross, and mul:State of Arizona v. James Arthur Ray, Case No. V1300CR201080049, Defendant James Arthur Ray’s Motion in Limine (No.9) To Exclude Testimony of Rick Ross. Please ping me once it's done, so I can delete them. Thanks. —Justin (koavf)TCM 12:14, 25 July 2020 (UTC)
@Koavf:   Done Please perform the required maintenance at this end of links, template updates and WD additions. — billinghurst sDrewth 02:10, 26 July 2020 (UTC)

I keep on finding more. Please ping me once it's done:

I'll keep on adding to this list. —Justin (koavf)TCM 00:17, 27 July 2020 (UTC)

Statistics

At the top of the main page there is written: "478,822 texts in English". I clicked it, hoping to learn more what is meant by "texts" (is the whole Encyclopaedia Britannica one text or is by a text meant each of its entries?), and got to Special:Statistics. That page includes many different numbers, but none of them is the one from the main page. The nearest of them is "content pages" (807,021) which is almost twice as much as the number in the main page. Would it be possible 1) to include the number from the main page into the statistics page too, 2) to explain there what exactly the numbers mean? --Jan Kameníček (talk) 22:22, 24 July 2020 (UTC)

See Template talk:ALL TEXTS for how the 478,822 figure is calculated. It's updated once a day. The content pages figure includes most of the namespaces (but not Page:). Beeswaxcandle (talk) 23:14, 24 July 2020 (UTC)
Also look at WS:AN and open up the collapsed "Wikisource snapshop" for more numbers. — billinghurst sDrewth 08:24, 25 July 2020 (UTC)
And of course phetoolsbillinghurst sDrewth 08:34, 25 July 2020 (UTC)
@Beeswaxcandle, @Billinghurst: Thanks for explanation, now I understand the number from the main page, but the problem as such stays. When people click on a clickable word or number, they expect to get somewhere where they get more information about it. But when they click on the number in the main page, they get to a statistics page which gives no information about the number.
As for the other number (currently 807,159), I am equally confused as I was before. Special:Statistics calls it "contents pages" while Wikisource:AN calls it "No. of articles", but none of them explains what it really is. Especially the latter label confuses me, as I would expect that "article" means encyclopedia/magazine/journal article in Wikisource context, but that is probably wrong as the number is too high for this. I guess it might be number of all mainspace pages and subpages including redirects and disambig. pages. If this guess is right, then its label should probably be changed or at least explained as ordinary reader who gets to the statistics page from the main page cannot understand it. --Jan Kameníček (talk) 09:06, 26 July 2020 (UTC)
@Jan.Kamenicek: They are just numbers. mw:Manual:Article count for us that will be main ns substracting redirects, not certain whether it covers author ns or not. Never bothered to find out. If you want stats, then https://stats.wikimedia.org/#/en.wikisource.org though they are WP-oriented and not really valuable for the WSes. — billinghurst sDrewth 10:46, 26 July 2020 (UTC)
Thanks again, but it is still information presented only here and only for me and possibly a few more contributors. If the statistics is not useful, it should not be presented, but if it is presented, it should be done in an understandable way. Current situation is on the halfway: the numbers are presented to readers (in or from our main page) but not explained. Clicking the clickable number makes the confusion even bigger. --Jan Kameníček (talk) 11:29, 26 July 2020 (UTC)
Special:Statistics is a default page for a mediawiki installation, if you want to have a go at it, and do the research go at it. Don't expect others to necessarily chase low relevancy components.

If you follow the link to the content, it does say! Main namespace, hide redirects.

For the "snapshot" I would say that they descriptions are reasonable, though I am open to suggestions of improvements. — billinghurst sDrewth 11:49, 26 July 2020 (UTC)

I see, I was not aware of the fact that the design of the statistics page cannot be influenced locally. In that case I suggest to found a page based on the snapshot from the WS:AN and link the number of texts from the main page there. I have made some minor modifications in my sandbox to start with. The modifications include: The table is not collapsed, No. of texts at the very top, better explanation of some labels such as "No. of pages in Main" --> "No. of pages in main namespace excl. redirects and disambiguation pages", "No. of articles" --> "No. of pages in main namespace"…, and robin’s tool replaced by phetools. Anybody can feel free to make any changes there. What do you think about this? --Jan Kameníček (talk) 12:43, 26 July 2020 (UTC)

Weird technical issue

This edit appears to have converted some of the field names on Index:Nationalism.djvu from English to (IIUC) Bengali. Is this just me, or is something getting messed up? Pinging @Satdeep Gill:, who made the edit, for info, but to be clear I'm not saying it's his fault. BethNaught (talk) 18:21, 25 July 2020 (UTC)

Huh, never mind; I purged the page and it went back to English. Probably a caching issue; I'll raise a phab ticket. BethNaught (talk) 18:23, 25 July 2020 (UTC)
I can't replicate this. I don't see any Bengali on any of the following pages:
Justin (koavf)TCM 18:24, 25 July 2020 (UTC)
Thanks for checking; I raised phab:T258867. FTR I suspect it was actually Punjabi, looking at Satdeep's user page; I'm sorry for leaping to a wrong conclusion. BethNaught (talk) 18:38, 25 July 2020 (UTC)
@BethNaught: Interesting. All I did was change the status of the Index from "to be validated" to "done". Can we replicate it somehow? --Satdeep Gill (talk) 06:13, 27 July 2020 (UTC)

13:51, 27 July 2020 (UTC)

Shantiniketan; the Bolpur School of Rabindranath Tagore

Thought I'd have a try at transcluding a complete volume that I've just finished proofreading, but I've realized I'm just completely out of my depth with understanding about creating and linking the pages together. If someone has the time and could fix my mess up for me, I promise not to try another full transclusion again. Sorry for dropping this. Sp1nd01 (talk) 21:09, 27 July 2020 (UTC)

@Sp1nd01: Fixed. And it wasn't particularly messy. All this transclusion stuff is hard to figure out, but reasonably doable once you've done it a few times. Don't give up, and always feel free to ask for help if you run into trouble. --Xover (talk) 13:55, 28 July 2020 (UTC)
Thanks again for your help and encouragement Xover, its nice to see the book tidied and presentable! I'll study and try and understand what you've done to get it that way. Sp1nd01 (talk) 19:38, 28 July 2020 (UTC)

HathiTrust

I have problems with downloading books from HathiTrust Digital Library. I use Hathi Download Helper 1.1.9, but recently I have been receiving only the message "Autoproxy: Verifying connection to… (trials x)" where x rises even to several thousands without any result and so I am not able to download anything. Is anybody experiencing these problems? --Jan Kameníček (talk) 07:18, 30 July 2020 (UTC)

Proof reader needed: 1812 English handbill

The image from which I have just transcribed The trial and execution of William Booth, seen on its talk page, is of a very poor quality 1812 handbill. I'd be grateful if someone comfortable with such old English printing would kindly take a look and see if they can resolve the remaining illegible text. It's quite short. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:49, 30 July 2020 (UTC)

@Pigsonthewing: With respect there are ways of making this so much easier than the way you have attacked this to date! May I suggest setting up an Index: page (maybe use e.g. Index:Winston Churchill to Franklin D. Roosevelt - NARA - 194822.jpg as a guide?) and then transfer the transcription you have already made to Page: space and link the result into the new Index: page?
Only then do you have a framework for collaborative proofreading; where the strengths of the system and review mechanism work for you and not against.
Please, please, please use the tools. Not every chisel appreciates being used as a screwdriver! 114.78.202.244 00:07, 31 July 2020 (UTC)
And not every sledghammer appreciates being used to crack nuts. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 00:09, 31 July 2020 (UTC)
Nobody is disputing you have made a sterling start on a transcription and I am not suggesting you waste that effort. I a merely pointing out the method by which you can gain the benefit of cooperation and cross-verification within the system tools. Being antagonistic means everybody's time is being wasted and will ultimately result in your good work being deleted at some time in the (maybe far) future. I am not inclined to help you further (especially now.) 114.78.202.244 00:35, 31 July 2020 (UTC)
Interesting text. I've set up the index page as suggested, and begun proofreading. -Pete (talk) 00:53, 31 July 2020 (UTC)
"will ultimately result in your good work being deleted" Please cite a policy supporting this remarkable claim. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:14, 31 July 2020 (UTC)
he is referring the the periodic deletion nominations at Wikisource:Proposed deletions, which practice i responded to here Wikisource:Scriptorium/Archives/2018-01#Scan-backed. the venues change but the deletionist sword of Damocles does not. ip by their banter talks like a banned admin. Slowking4Rama's revenge 21:33, 31 July 2020 (UTC)

Is it all right to translate a novel into another language?

If it is OK, I would like to translate a novel in public domain here into another language. Green (talk) 09:28, 23 July 2020 (UTC)

P.S. I mean that adding new pages in this project after translation.Green (talk) 10:03, 23 July 2020 (UTC)
@Green:, you absolutely are allowed to use any work at Wikisource for the basis of a translation (actually, you can use it for any purpose). However, if the target language is not English, the result belongs on the relevant language Wikisource. Not all works at en.wikisource.org are necessarily accepted at the other languages. English Wikisource policy is public domain in the US, other languages can vary (e.g. German Wikisource requires PD in Germany). Inductiveloadtalk/contribs 10:34, 23 July 2020 (UTC)
@Inductiveload: Thank you for your quick reply. I am going to translate an English novel into Japanese, so that it seems to me there is no problem. ----Green (talk) 10:46, 23 July 2020 (UTC)
If you have doubts, you can check the jaWS version of WS:COPY: ja:Wikisource:著作権, and editors at the Japanese Wikisource Scriptorium might be able to assist with anything specific to jaWS, but feel free to ask questions here too. Inductiveloadtalk/contribs 11:10, 23 July 2020 (UTC)
If published through 1928 and thus in the public domain in the USA, go to Old Wikisurce if still copyright-restricted in Japan and thus barred from Japanese Wikisource. Telling us which works to be used may get better answer.--Jusjih (talk) 20:08, 2 August 2020 (UTC) (Your Eastern Asian cultural bridge)

User:350bot

For creating the remaining red links of A Chinese and English vocabulary, in the Tie-chiu dialect#Contents. I might use AWB for future tasks, but this task will probably be manual. Suzukaze-c (talk) 06:58, 28 July 2020 (UTC)

Adding color images where the original was black and white

The Waning of the Middle Ages has a bunch of images, many of which are famous artworks that have higher-resolution copies on Commons. Previously, the images were page scans, but I just replaced them with color versions for those images where I could find a color version of the same image on Commons. Was this the right thing to do? Wikisource:Image guidelines, as far as I can tell, doesn't address this. Vahurzpu (talk) 22:34, 31 July 2020 (UTC)

I agree with and appreciate what you did, and I'm interested to hear what others have to say. I also occasionally "upgrade" to a higher quality image, though I have not yet run across a situation where this means going from greyscale to color. (Here's one I did today: Page:Centennial History of Oregon 1811-1912, Volume 1.djvu/817) It seems to me the reader is always better served with a higher quality image. As long as there is transparency in the form of side-by-side proofreading view, a reader who is specifically interested in how it was originally published has the ability to learn about that; such readers will be consulting the page scans regardless. -Pete (talk) 23:35, 31 July 2020 (UTC)
yeah, i would say you made the right call. the ethic being "bringing value to the reader". we have verisimilitude issues, (even if we are better than gutenburg) and fanatic replication of works may decrease utility in some cases. Slowking4Rama's revenge 13:56, 1 August 2020 (UTC)
I, too, think this is OK, with the caveat that there should be a user annotation noting the substitution. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:06, 2 August 2020 (UTC)
In my opinion, if the original publication used greyscale images, then greyscale images are preferable here too; but if the original publication used colour images, but the scan is black and white, then colour images are preferable here. However, it's probably fine to use either. Sometimes using colour images instead of black and white is easier, and that is fine. —Beleg Tâl (talk) 15:36, 2 August 2020 (UTC)
If the original publication had black and white images, we should definitely stay faithful to the original and keep them black and white too. I agree with Beleg Tâl that replacing them for coloured ones when only scans are black and white while the original version was coloured too is not only OK, but desirable. --Jan Kameníček (talk) 15:42, 2 August 2020 (UTC)
I have no idea whether the original publication had them in color in this case; however, it's perfectly plausible that they were originally colorful. They're on separate picture plates, rather than integrated with the text, and page 143 makes reference to "red and blue cherubim" in a painting, which are very obvious in the original but not in the scan. Vahurzpu (talk) 19:53, 2 August 2020 (UTC)
We did something similar for The Russian School of Painting, a book about Russian art. The scan was entirely in greyscale. But since the book discusses art, the reader is better served by faithfully presenting the works of art being discussed, rather than faithfully presenting the scan or faithfully reproducing limitations present because of the method of publication. And because the work is scan-backed, a reader can view any page side-by-side with the backing scan, if they wish to do so. When books publish images that are small, we do not "faithfully reproduce" tiny images. Rather, we make available the highest resolution scans that we can, even if that means a better quality image than the scan, and sometimes better than the printed page. Yes, we sometimes have access to original plates rather than printed copies. When the original book has yellowed with age, or was printed on paper that was greyish, we often correct for color in the image. Since we can (and do) provide images of higher quality than the scan or even a printed page, I see no reason to limit ourselves to slavishly reproducing greyscale scans of prints from plates made from photographs, when a photograph of the original is available. --EncycloPetey (talk) 00:30, 16 August 2020 (UTC)
  • This has just come up in Creative Commons for Educators and Librarians, which I am working on now. The black-and-white images are often to be found on Wikimedia Commons; but the version of the images used may not be the same as the images as they exist now. Should (1) the black-and-white image be used, (2) the color image as it existed at the time be used, or (3) the color image as it exists currently be used? There are also a number of images from Flickr which need to be uploaded (if they are not present on Wikimedia Commons already). TE(æ)A,ea. (talk) 12:39, 17 August 2020 (UTC).
    • I have replaced all of the images with their color counterparts, with the exception of this image, which I could not access. I replaced images which represented text with the text which the images represented, as well. TE(æ)A,ea. (talk) 12:34, 25 August 2020 (UTC).

Proposing to delete disambiguation pages that are subpages of works

The following discussion is closed:

deleting the disambiguation pages


Preferring to have this as an open conversation at Scriptorium where more eyes will see it rather than WS:PD, though I can move it there if required, especially as we have lightly had this conversation previously.

In times past there were created pages within works

and more recently

If you run report petscan:657164 you will see most that are there today as they don't have WD entries. I will search for all of them at a later time.

These are not pages that exist within the works themselves so have no particular value within the works. They don't sit linked within works, they are just predominantly orphaned pages. Also noting that there are page listings for each of these works that will effectively disambiguate these pages. I will also note that for our numerous biographical works it will be a huge exercise to propagate

If we require disambiguation pages they should be sitting as root pages, and disambiguate all works of the name, as has been done with Emerson. Not that I enamoured with such pages nor certain that they actually reflect true disambiguation pages nor are effective or are complete. [That is not the conversation today!]

We would also need to update our guidance at Help:Disambiguation pages to be explicit to not create such pages. I would probably write a filter that looks for such pages to deal with any future circumstances. — billinghurst sDrewth 00:15, 26 July 2020 (UTC)

I'm not sure about the subpage ones, but the Supreme Court ones (like Jones v. United States) do seem useful to me, as likely catches for mistaken off-wiki links. But the subpage ones should probably go, indeed. JesseW (talk) 01:25, 26 July 2020 (UTC)
@JesseW: They stay, they are not subpages, and we typically add to them. I have been adding them to existing disambig items at WD where they exist, and will get to creations at some point. That is the actual point of the query. — billinghurst sDrewth 01:44, 26 July 2020 (UTC)
@Billinghurst: Admittedly I am insufficiently caffeinated just now, but I am completely failing to understand what you're talking about here. Aren't the pages you link just the normal main work pages for the works in question? I'm not really seeing the disambiguation aspect, or any subpages. Help? --Xover (talk) 06:46, 26 July 2020 (UTC)
@Xover: I just listed the parent works that contain disambiguation subpages, one needs to run the petscan report to see the pages. — billinghurst sDrewth 07:36, 26 July 2020 (UTC)
petscan:16913508 <= main ns, use template:disambiguation and are subpages; 38 pages — billinghurst sDrewth 07:41, 26 July 2020 (UTC)
lightbulb goes on Oh, I think I see:
Where the idea is that we don't need … /Abdera to dab … /Abdera (Spain) and … /Abdera (Thrace) because navigation in the work (toc, indices, cross links, etc.) points directly to the intended target (or at least should do so)?
Provided I (finally) understood that correctly, I think I agree we should get rid of these that exist now. In the general case I think we'd either want to ban these and never use them, or to create them proactively for all such cases. But even if one were to use them they should not be used in any normal internal links (intrawork they are a detour from the intended target, and interwork there shouldn't be a link unless it's clear whether the target is the entry on Spain or Thrace). Which means that their sole purpose would be as targets of interwiki or off-wiki linking, which, in my opinion, would need some pretty clear and compelling use case to be merited. --Xover (talk) 08:24, 26 July 2020 (UTC)
Pretty much. The specific pages are linked from the parent (somehow). If we were to disambiguate it would be [[Abdera]]. Though to do that for every reused term in our works would be a nightmare. Imagine [[John Smith]] or is it [[Smith, John]]. Anything like that would have to be systematically prepared (Listeriabot?), which would mean that we would have to get all our works into Wikidata, and that is a nightmare scenario with how that is done here. — billinghurst sDrewth 11:39, 26 July 2020 (UTC)
All our works should be in Wikidata anyway. If there's a significant backlog it may be a task for a bot, or otherwise a matching tool of some kind, with automated suggestions and human verification. Does anyone have a measure (or estimate) of the task at hand? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 08:52, 22 August 2020 (UTC)
  Delete x100. I've already created Category:DNB disambiguation pages‎ and Category:EB1911 disambiguation pages to collect these in one place in preparation for such a deletion proposal. —Beleg Tâl (talk) 00:18, 29 July 2020 (UTC)

Closing discussion and deletion these couple of dozen pages. — billinghurst sDrewth 04:40, 28 August 2020 (UTC)

  Comment Added text tp Help:Disambiguation to reflect this update (see special:diff/10413151) — billinghurst sDrewth 05:00, 28 August 2020 (UTC)
This section was archived on a request by: — billinghurst sDrewth 05:01, 28 August 2020 (UTC)

Policy on substantially empty works

[This is imported from WS:PD, where it applies to multiple current proposals, and several other works].

We have quite a few cases of works that are "collective" or "encyclopaedic" in that they comprise many standalone articles of individual value, which are basically just "shell pages", with no substantial content of any sort, not even imported scans or Index pages. For example, and this isn't intended to make any statement about these specific works, they're just examples and they may well get some work done soon during their respective WS:PD discussions:

Based on the usual rate of editing for things like that, unless dragged up into a process like WS:PD, they'll remain that way a very, very long time. I think it is perhaps there might be a case to host a mainspace page for this work, even though there is zero, or almost zero actual content. Do we want:

  • Mainspace pages where this is a tiny bit of information like header notes, scan links and maybe detective work on the talk page (not in this case). This provides a place for people to incrementally add content. Also gives "false positive" blue links, since there is actually no "real" content from the work itself, or
  • Do not have a mainspace page until there's some content. Only host this in terms of scan links author/portal scan links, much like we do for something like a novel.

Personally, I lean (gently) towards #2, but with a fairly low bar for how much content is needed. Say, Indexes, basic templates, a title page and one example article. Ideally, a completed TOC if practical, especially for periodical volumes/numbers. It is fair to not wish to transcribe entire volumes of these work, it is fair to not want to import dozens of scans when you only wanted one, it is fair to only want an article or two, but it's not fair, IMO, to expect the first person who wants to add an article to have to do all the groundwork themselves, despite having been lured in with a blue link. That onus feels more like it should be on the person creating the top-level page in the first place.

I do see some value in periodical top pages with decent lists of volumes and scans where known, because these are often tricky and fiddly to compile from Google books/IA/Hathi, so it's not useless work, even if there are no imported scans (though imported is better than not).

We currently have a large handful of collective works listed for deletion right now in various levels of "no real content", and, furthermore, every single periodical that gets added can fall into this situation unless the person who adds, so I think we could have a think about what we really want to see here. Inductiveloadtalk/contribs 15:43, 3 July 2020 (UTC)

  • I believe that, if there is no scan as an Index: page, the main-namespace page should not exist unless it is being actively completed or is already mostly completed. A few pages (of the volume itself) is not very helpful, and is entirely useless if their is no scan given. TE(æ)A,ea. (talk) 15:59, 3 July 2020 (UTC).
  • I think such preparatory information would ideally be on more centralized WikiProject pages (for the broad subject), both for clarity and to assist in keeping different efforts consistent -- but that it certainly should be retained as visible to non-admins. I think that the red vs blue link issue is minor (but not totally negligible) and outweighed by the disadvantages of hiding the history of previous efforts. I strongly encourage redirecting such pages to appropriate WikiProject pages (after copying over the details there). JesseW (talk) 18:11, 3 July 2020 (UTC)
  • @JesseW: I agree that history shouldn't be deleted, but I think we should approach this in terms of what we want to see from these works, rather than what to do with the handful of examples at PD. There are hundreds of periodicals we could have but don't, and this applies to those as well. If we can come to a conclusion about what is and isn't wanted, we can make all the deletion requested works conform to that easily enough. Inductiveloadtalk/contribs 20:55, 3 July 2020 (UTC)
  • I think these pages are necessary to list index pages and external scans of multi-volume works (such as encyclopaedias and periodicals) especially if they are wholly or partly anonymous or have many authors or are simply large. I think it makes no difference whether such pages are in the mainspace, the portal space or the project space (except that it is harder to find pages outside the mainspace). The point is that these works often have so many volumes (often dozens or hundreds) that they must have their own page, and cannot be merged into a larger portal or wikiproject. If the community starts insisting on index pages, what will happen is the rapid upload of a large number of scans for the periodicals that already have their own page. Likewise if the community insists on transclusion. I also think it is reasonable to have a contents page in the mainspace, as it allows transclusion of articles. Most importantly, new restrictions should not immediately apply to existing pages that were created before the introduction of the restrictions. This is necessary to prevent a bottleneck. James500 (talk) 23:55, 3 July 2020 (UTC)
move the works to a maintenance category, and i will work them; delete them and i will not: i find your sword of Damocles demotivating. Slowking4Rama's revenge 01:55, 5 July 2020 (UTC)
@User:Slowking4: I am not proposing a sword of Damocles. I agree that the imposition of deadlines is counter-productive. I do not support the deletion of any of these pages. I would prefer to see them improved. James500 (talk) 04:38, 5 July 2020 (UTC)
TEA is on his usual deletion spree. not a fan. will not be finding scans to save texts, any more. he can do it. Slowking4Rama's revenge 00:15, 6 July 2020 (UTC)
The entire point of moving this here, and not staying at WS:PD is to decouple from the emotions that get stirred up in a deletion discussion. Let's keep deletion out of this. If we come up with some idea of what we do and don't want, then we can go back to WS:PD and decide what to do. I imagine that all that will be needed will be a fairly limited amount of housework to bring those works up to some standard that we can decide on here, and all the collective works there will be easy keeps. Hopefully with some kind of consensus that we can point at to outline a minimum viable product for such works going forward. There are hundreds and thousands of dictionaries, encyclopedias, periodicals and newspapers that we could/will, quite reasonably, have only snippets of. How do we want to present them? What, exactly, is the minimum threshold? Let's head of all those future deletion proposals off at the pass, because deletion proposals often cause friction. Inductiveloadtalk/contribs 00:47, 6 July 2020 (UTC)
and yet deletion is the default method to "motivate" quality improvement. i reject your assertion that "emotions get stirred in a deletion discussion", rather, anger is a valid response to a repeated broken process being kicked down on the volunteers. it is unclear that a minimum threshold is necessary, rather a functional quality improvement process is. until we have one, you should expect to see this periodic stirring of emotions, as the non-leaders act out. Slowking4Rama's revenge 11:53, 9 July 2020 (UTC)
@Slowking4: Thank you for presenting this opinion, and I'm sorry if I have not made myself clear. We do need to figure out how to avoid a de-facto process of using WS:PD as an ill-tempered ad-hoc venue for "forcing" improvements on people who have somehow managed to generate works that are so in need of improvement that another user has nominated them for deletion. Please also consider looking at #Re-purpose_WikiProject_OCR_to_WikiProject_Scans for an idea to have a "functional quality improvement process" to which such works could be referred upon discovery rather than kicking them straight to WS:PD. If you have other ideas or you have previously suggested something similar to address these frustrations, you could detail them there. Personally, I think we should always prefer improvement over deletion. Exactly what the remediation is (refer to a putative WP:Scans, WS:Scriptorium/Help, directly WS:PD as now, or something else) is not what this thread is for. This thread is for discussing, what, if anything, should be the tipping point for deeming a page "lacking" and doing something about, whatever "something" is. I don't think I can be much clearer that this is not about deletion. If we also have a better venue for improvements, then that's even better.
For example, my personal feeling and !vote on A Critical Dictionary of English Literature is "keep and improve", despite it lacking scans or even links to scans, having only one article and no other content, not even a title page: in short, failing almost every criterion suggested so far in this thread. The only thing it does have is have is good text quality of the one entry. I personally do not think this work should be deleted, but I do think it should be improved in specific ways. The first half of that sentence is not the focus of this discussion, the second half is. Inductiveloadtalk/contribs 14:18, 9 July 2020 (UTC)
deletion threat has been an habitual method of communicating by admins since the beginning of the project. and text dumps have been habitual following in the guttenberg example. culture change and process change would be required to change those behaviors. we could may it easier to start scan backed works, but the wishlist was not supported. Slowking4Rama's revenge 21:00, 14 July 2020 (UTC)

I don't think this needs to be much of an issue going forward -- we all agree that it's OK to create Index pages for scans, even if none of the Pages have been transcribed yet; so the only case where this would come up is recording research where no scan has yet been identified as suitable to be uploaded. And for that, I still think a WikiProject page is the right location, not mainspace. (Or, if you must, your userpage.) JesseW (talk) 00:59, 6 July 2020 (UTC) I realized I may not have been clear enough here -- in my view, the ideal process goes like this:

  1. Decide on a work you are interested in (in this case, a periodical/encyclopedic one) -- don't record that anywhere on-wiki (except maybe your user page)
  2. Find and upload (to Commons) a scan of one part/issue/etc of the work.
  3. Create a ProofreadPage-managed page in the Index: namespace for the scan. (You can stop after this point, without worry that your work will later be discarded.)
  4. EITHER
    1. Put further research (on other editions, context, possible wikification, etc.) on that Index_talk page.
    2. Proofread a complete part of the scan (an article from the magazine issue, a chapter from the book, a entry from an encyclopedia, etc.) and transclude it to the mainspace (and create necessary parent pages), and put the further research on the Talk: page of the parent mainspace entry.

If you can't find any scan, and don't want to leave your working notes on your user page, put them on a relevant WikiProject's page.

If you come across such research done by others and misplaced, follow the above process to relocate it to an appropriate place, then redirect the page where you found it to the new location. That's my proposal. JesseW (talk) 01:08, 6 July 2020 (UTC)

@JesseW: It's not clear to me in your above whether when you use the term "index" you refer to a ProofreadPage-managed page in the Index: namespace, or a general wikipage in the main namespace on which an index-like structure (and/or a ToC, or similar) is manually created. Could you clarify? --Xover (talk) 05:14, 6 July 2020 (UTC)
I meant the namespace. Clarified now. JesseW (talk) 05:17, 6 July 2020 (UTC)
  • Hoo-boy. Y'all sure know how to pick the difficult issues…
    My general stance is that: 1) scans and Index: (and Page:) namespace pages have no particular completion criteria to meet to merit inclusion, and can stay in whatever state indefinitely (there may be other reasons to get rid of them, but not this); and 2) the default for mainspace is that only scan-backed complete and finished works that meet a minimum standard for quality should exist there.
    That general stance must be nuanced in two main ways: 1) there must be some kind of grandfather clause for pre-existing pages; and 2) there must exist exceptions for certain kinds of works that meet certain criteria. I won't touch on the grandfather clause here much, except to say I'm generally in favour of making it minimal, maybe something like "No active effort to get rid of older works, but if they're brought to PD for other reasons they're fair game". The design of a grandfather clause for this is a whole separate discussion, and an intelligent one requires analysis of existing pages that would be affected by it. It is always preferable to migrate pages to a modern standard, so a grandfather clause is by definition a second choice option.
    Now, to the meat of the matter: the exceptions…
    We have a clear policy to start from: no excerpts. Works should either be complete as published, or they should not be in mainspace. But quite apart from the historical practices that modify this (which are somewhat subjective and inconsistent, so I'll ignore them for now), there are some fairly obvious cases that suggest a need for more nuance than a simple bright-line rule alone provides. The major ones that come to mind are: 1) massive never-completed projects like EB1911 or the New York Times (EB because it's big; NYT because new PD issues are added every year); 2) compilations or collections of stand-alone works with plausible claim to independent notability.
    For encyclopedias and encyclopedia-like things, we have to accept some subsets due to sheer scale of work. But when that is the grounds for exception, there needs to be some minimum level of completion. I'm not sure I can come up with a specific number of pages/entries or percentage, but it needs to be more than just a single entry (and, obviously, only complete entries). For this kind of exception to apply, I think it needs to be a requirement that the framing structure for it is complete: that is, the mainspace page should give a complete overview of the relevant work even if most of it is redlinks. That includes title pages and other prolegomena when relevant. For a periodical like the NYT, that means complete lists of issues with dates and other such relevant information (e,g. name changes etc.). For preference, these kinds of things should be in Portal: namespace or on a WikiProject page until actually complete, but that will not always be practical (EB1911 and NYT are examples of this). Mainspace or Portal:-space should never contain external links (i.e. to scans) or links to Index: or Page: space (except the implied link of transclusion and the "Source" tab in the MW UI provided by ProofreadPage).
    For exception claimed under independent notability there are a couple of distinct variants.
    Newspaper or magazine articles need to have a certain level of substance in addition to a specific identifiable byline (possibly anonymous or pseudonymous, and possibly identified after the fact by some other source, such as the Letters of Junius) in order to qualify. It is not enough to ipso facto be a newspaper article, a magazine article, a poem, or an encyclopedia entry. On the one hand we have things like dictionaries and thesauri, where an entry could be as little as two words. Or a one-sentence notice without byline in a newspaper. Or two rhymed lines (technically a poem) within a 1000-page scholarly monograph.
    To merit this exception it should be reasonable to argue that the "work" in question should exist as a stand-alone mainspace page (not that we generally want that; but as a test for this exception, it should be reasonable to make such an argument). This would clearly apply to moderately long entries in the EB1911 written by a known author that has their own Wikipedia article. It would apply to short stories or novella-length serialisations in literary magazines by authors that have later become famous (or "are still …"). It would apply to various longer-form journalistic material from identifiable journalists (again, rule of thumb is notable enough for enWP article), including things in magazines that have similar properties. For most periodicals the most relevant atomic (indivisable) part is the issue not the entry or article, but with some commonsense exceptions.
    It would, generally, not apply to things that are works by a single author, like a scholarly monograph that just happens to be arranged in "entries" rather than chapters. It would not apply to things that are essentially lists or tables of data. It would not apply to short entries in something encyclopedia-like or entries that are not by an identifiable author. The OED for example, iirc, is a collective work where entries are by multiple not individually identifiable authors (and each entry is mostly very short too); only the overall editor is usually cited.
    For works claiming this exception too the framing structure should be complete, even if most of it are redlinks. The same general rules about Portal:/WikiProject and no external or Index:-space links apply. An exception would be for periodicals where new issues enter the public domain every year; and we should generally avoid including even redlinks for the non-PD issues here (but may allow them in a WikiProject page). For non-periodical works in multiple volumes where some volumes were published after the PD cutoff, including listings for the non-PD volumes (but not links to scans; those are a copyvio issue) is ok.
    Poems, short stories, and novellas are a special class of works here. A lot of these were first published in a magazine (possibly serialized), and a lot of them exist as multiple editions in substantially the same form. Some exist in multiple versions. These should all primarily exist the same way as chapters as part of their various containing works; but there are some cases where we might want to have, for example, a series of connected pages of the poems of Emily Dickinson. I am significantly ambivalent about this practice, as it amounts to making our own "edition" or "collection" of her poems (in violation of several of our other policies), but I acknowledge that it is an established practice and it is something that has definite value to our readers. It may be that it is actually a practice that should be governed by its own dedicated policy rather be attempted to be handled within these other general policies.
    For the sake of example; applying this to the works Inductiveload listed at the start of this thread would shake out something like this:
    Auction Prices of Books—This work appears to have no sensible subdivisions and is in any case by a single author. I see no obvious reason to grant this work an exception, except under sheer volume of work and even there I would want to see both a substantial proportion completed and some kind of ongoing effort towards completion (no particular time frame, but definitely not infinite and definitely not as an effectively abandoned project). In a deletion discussion I would very likely vote to delete the mainspace pages here (but, as nearly always, to keep the Index: and Page: namespace artifacts). I don't see this as a reasonable candidate for a Portal:, nor really a good fit for a WikiProject (though I probably wouldn't object to a WikiProject if someone really wanted one).
    Central Law Journal/Volume 1—A single volume is too little, so I would want to see a complete structure for the entire Central Law Journal, with level of detail for each volume similar to the one existing volume. Each article in the journal can be individually considered for a stand-alone work exception; but for the collection I would want to see at minimum a full issue finished to justify having the mainspace structure, and preferably multiple issues (in a deletion discussion I might insist on multiple issues). Index: and Page:-space artefacts can, of course, stay. A Portal: might make sense for selections from the journal, of articles that meet the standalone work exception. A WikiProject to coordinate work and track links to scans etc. might be a decent fit here, if someone wanted that. As it currently stands I would probably vote delete for the mainspace artefacts (with option to move whatever content has reuse value to a non-mainspace page for preservation; and undeleting if someone wants to work on something is a low bar).
    A Critical Dictionary of English Literature—The top level mainspace page has near-zero value, existing only to link to the single transcribed entry. For a credible claim to exception to exist it would need to be a complete framework for the work as a whole, and significantly more than a single entry must be complete. I would probably also want to see ongoing work, unless a substantial percentage of the entries were complete. The single finished entry is eligible to claim a standalone work exception, but I think it probably would not meet my bar for that (I might be wrong; and the rest of the community might judge it differently). In a deletion discussion I would probably vote to delete all the mainspace artifacts here (as always keeping Index:/Page: stuff) but with a definite possibility that I might be persuaded on the one completed entry (an absolute requirement for convincing me would be to scan-back it: as a separate issue, my tolerance for grandfathering of non-scan-backed works is small, and effectively zero for new/non-grandfathered works).
    Bradshaw's Monthly Railway Guide—Would need a full framework and a number of individual issues finished to merit a mainspace page. I see no credible subdivisions for a standalone work exception, but might be persuaded otherwise if, say, one of the train tables was used as a (reliable primary) source in a Wikipedia article (implying some sort of notability beyond just being raw data). In a deletion discussion I would probably vote to delete all mainspace artifacts here. If anyone made the argument, I would entertain the notion that there is value in treating train tables like poems, and hosting a series of train tables like we do Dickinson's poems; but that would require a substantial number of them completed.
    For everything above my stance is nuanced by a willingness to accept temporary exceptions for things that are actively being worked: active being operative, but with no particular deadline to complete the work. We have differing amounts of time available, and some works are so labour-intensive or tedious to do, that my person threshold for "active" is a pretty low bar to clear. If it's months and years between every time you dip in and do a bit I might start to get antsy, but days or weeks probably won't faze me. And that the projected time to completion is very long at that pace is not particularly a problem so long as it is not infinite. Within those parameters I would always tend to err on the side of letting contributors just get on with it in peace, regardless of any of the policy-like rules sketched above.
    I also want to emphasise that I think this is a very difficult issue to deal with. There are a lot of competing concerns, and a lot of grey areas that will likely take individual discussions to resolve. My balance point on this issue is partly formed by a broader concern about our overall quality (we have waay too many works of plain sub-par quality, and too many not up to modern standards) and a hope that by preventing the creation of these kinds of works (rather than deleting them after creation) we will be able to retain the good and desirable exceptions without dragging down quality, and without the traumatic and stressful events that deletions and proposed deletion discussions are.
    And for that very reason I am grateful this issue was brought up here for discussion, and I hope we can end up with some clear guidance, possibly in the form of a policy page, going forward. And in any case, since it will create de facto policy, this is a discussion that needs to stay open for a good long while (there are several community members that have not yet commented whose opinion I would wish to hear before closing this), and depending on how well we manage to structure the consensus, may also require a formal vote (up in the #Proposals section). --Xover (talk) 09:03, 6 July 2020 (UTC)
  •   Oppose. It is becoming clear that a policy on incomplete works in the mainspace is going to place enormous pressure on individual editors. I think it would be more effective to start a wikiproject devoted to scan-backing works that lack scans and so on. James500 (talk) 12:14, 6 July 2020 (UTC)
    • @James500: FYI, this thread was made in order to provide an exception to the current policy of "no excerpts". A literal reading of the policy as it stands has a plausible chance of coming down delete on the mainspace pages over at WS:PD. This thread is a chance to come up with a better way to support such partial collective works. That we have several substantially incomplete and abandoned collective works lolling around in mainspace is actually the result of laxity in respect to stated policy (not to say I think it's a bad thing). The deletion proposals, whatever you may think of them, are actually not in contradiction to policy. That said, as always, there is scope to adjust policy. Which is what this is.
    • Now, in terms of a WikiProject to scan back works, I think that is a good idea. See #Re-purpose_WikiProject_OCR_to_WikiProject_Scans above, which proposed to reboot Wikiproject OCR as a scan-backing Wikiproject. Inductiveloadtalk/contribs 14:40, 6 July 2020 (UTC)
      • The policy says "When an entire work is available as a djvu file on commons and an Index page is created here, works are considered in process not excerpts." A literal reading of that policy is that no scan-backed work is an excerpt (it is expected to be completed eventually). Further the policy refers to "Random or selected sections of a larger work". A literal reading of that expression is that it does not include lists of scans, or auxilliary content tables, as they are not "sections" (they are not part of the work), and that not every incomplete portion of a work is either "random or selected" (which would not include starting from the beginning and getting as far as you can, with intent to finish later). I could probably argue that an encyclopedia article or periodical article is a complete work. James500 (talk) 15:16, 6 July 2020 (UTC)
  • Nice wall of text, Xover (and I say that with great respect!) -- it generally makes sense and sounds good to me. As another hopefully illustrative example, take The Works of Voltaire, which I've been digging thru lately. I think this would very much satisfy your criteria as a large work, with sufficient scaffolding to justify the mainspace pages that exist for it. I would love to hear others thoughts on that. JesseW (talk) 16:07, 6 July 2020 (UTC)
    @JesseW: Yeah, apologies for the length. Brevity is just not my strong suit.
    The Works of Voltaire probably qualifies on sheer scale of work, yes. I don't think the current wikipage at The Works of Voltaire is quite it though: as it currently stands it is more WikiProject than something that should sit in mainspace (its contents are for Wikisource contributors, to organise our effort, not our readers, who want to read finished transcriptions). It also mixes a work page with a versions page in a confusing way. So I would probably say… Move the current page to Wikisource:WikiProject Voltaire; create a new The Works of Voltaire as a pure versions page, linking to…; The Works of Voltaire (1906), that is set up as a work page with the cover and title (and other relevant front matter) of the first volume, and an AuxTOC (and possibly also the {{Works of Voltaire}} volume navigation template). I don't know how tightly coupled the volumes of this edition are (does the first volume have a common ToC or index of works for all the volumes?), so some flexibility on format may be needed to make sense. But as a base rule of thumb it should start from a regular works page and deviate only as needed to accommodate this work (mainly the size is different).
    In any case… With a volume or two completed (they're only ~350 pages each) I'd be perfectly happy having something like that sitting around. With less then that I'd possibly be a bit more iffy, but it's hard to put any kind of hard limit on that. And with somebody actively working on it I'd be in no hurry whatsoever regardless of current level of completion.
    PS. I'm pretty sure a large proportion of the contents of these volumes are works that would qualify under "standalone works" that could exist independently in mainspace, regardless of what's done with the The Works of Voltaire page. Even his individual poems and essays can presumably make a credible claim here (because it's Voltaire; less famous authors would have a higher bar). Better as part of the edition, but also acceptable on their own. --Xover (talk) 16:56, 6 July 2020 (UTC)
  • @JesseW: I personally take no issue with this page's existence (actually I think it's a nice work and good way to allow an important author's works to be slotted in piece-by-piece. I have some general comments which overlap with this thread (written before Xover's reply, so pardon overlap):
    • First off, I differ with Xover in terms of the scan links: I think they're better than nothing, and I don't see much value in duplicating the volume list onto an auxiliary page just to add scan links. However, I can sympathise with the sentiment that our mainspace shouldn't direct users off-wiki (or at least off-WMF). But if we don't have the scans, and that's what the user wants, they're leaving anyway. Real answer: import moar scans!
    • No scan links are necessary where the volume exists in mainspace and is scan-backed (e.g. v3)
    • Ext scan links should only be used when there is no Index page or imported scan. Use {{small scan link}} or {{Commons link}} when possible (e.g. v2)
    • The first volume list could probably be in an AuxTOC to mark it out as WS-generated content.
    • The "Other editions" section belongs on an auxiliary namespace page (Talk, Portal or Wikisource). I suggest the Talk page is best in this case. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
  • @Xover: I am in agreement with the majority of what you say. Particularly, I think a framework around any collective work (be it a single-volume biographical dictionary or a 400-issue literary review spanning 80 years) is the critical prerequisite, plus at least some scans, the more the merrier. Where I think I differ:
    • I am inclined to be a bit more relaxed in terms of how much of a work we need. As long as a single article exists, it's not "trivial" (e.g. only a short advert or some incidental text like a "note to correspondents", as opposed to an actual article), it's well-formatted and scan-backed, and a complete framework exists, including front matter and a TOC, such that's it is easy for anyone to slot in new pieces, I'd be fairly happy. Lots of periodicals have all sort of tricky bits like tables of stocks or weather tables and writing into policy that those must be proofread in order to get the "real" articles into mainspace would be a chilling effect, in my opinion. If you allowed an exception, it would be verbose and tricky to capture the spirit without saying "unless, like, it's totally, like, hard, man".
    • I am not dead against scan links in the mainspace at the top level, when such a top-level page exists. See my comments on Voltaire above. I am against them where they could sensibly be on an Author page and they are the only mainspace content.
    • I am ambivalent on the presence of, e.g., disjointed train timetables. It's not my thing to have a smattering of random timetables, but as long as they're individually presented nicely, it's not too offensive to my sensibilities. I might question the sanity of someone who loves doing tables that much, but whatever floats the boats! Also, I think that this might circle back to "good for export" - a mark which certainly would require completed issues or volumes. If you want to get that box ticked, you have to do it all.
    • Re the "notability" aspect of individual articles, I'm not really bothered by that, as I don't think we'll see a flood of total dross because few people really want to take the time to transcribe 1867 articles about cats in a tree from the Nowhere, Arizona Daily Reporter, and, actually I think some of the "dross" can be quite interesting in a slice-of-life kind of a way (always assuming well-formed and scan-backed). And the real dross is usually so bad (no scans, raw OCR, etc) that it can be dealt with outside of this topic. I think part of the value of WS is the tiny, weird and wonderful, not just in blockbusters like War and Peace and Pultizers. I think I might like to see more of our articles strung together thematically via Portals, but that's another day's issue. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
      • @Inductiveload: We appear to be mostly in agreement. But… instead of me dropping another wall of text on the remaining points of disagreement, maybe that means we're in a position to try to hash out a draft guidance / policy type page with the rough framework? Then we could go at the remaining issues point by point. Because I think I'm in with a decent chance to persuade you to my point of view on at least some of them, but this thread is fast getting unwieldy (mostly my fault). It would also probably be easier for the community to relate to now, and much easier to lean on in the future. --Xover (talk) 18:31, 6 July 2020 (UTC)
        • @Xover: If there are no more comments forthcoming after a couple of days, I think that makes sense. I don't want to railroad it: considering we have at least one !vote for "do nothing", I'd like to see if there are any other substantially different opinions floating about. Inductiveloadtalk/contribs 17:41, 7 July 2020 (UTC)

The quantity of text here has grown far faster than my ability to absorb it, so rather than continue to put it off, here's my position: I don't see any problem with transcriptions that are scan-backed, even if the transcription only covers a small fraction of the entire scan. If Sally chooses (say) to transcribe a favorite story, that happened to be published in an issue of Harper's back in the 1890s, and goes to the trouble of uploading the full issue, but only creates pages for the one story that interests her, I think that's great. It doesn't matter to me whether she intends to work on the other pages or not. If it's not scan-backed, but it's fairly high quality, I am personally willing to do some work trying to locate a scan and match it up to the text; I'd rather we take that approach, than deletion, though of course deletion is the better option in some cases where the scan is very hard to come by.

If all this has been said above, or if I've misunderstood the topic, my apologies. Please take this comment or leave it, as appropriate. -Pete (talk) 02:00, 8 July 2020 (UTC)

Apologies, I see I had missed the point.

I disagree with Xover's statement that a top-level page for a publication, with a link only to a single article within the publication, has "near-zero value." Such a page can serve an important function linking content together in ways that help the reader (and search engines) find the content they're looking for, or understand the context around it. For instance, A Critical Dictionary of English Literature is linked from the relevant Wikidata entry. The banner on the Wikisource page clearly tells a Wikisource reader that they won't find a full transcription here; and with a simple edit, it could link to a full scan on another site, or (with perhaps a little more effort) even transcription links here on Wikisource. This page has been here since 2010; we don't have any way of knowing what links might have been created elsewhere in the intervening decade. (I do think that new pages like this should not be created without a scan at Commons to be linked to.) -Pete (talk) 02:12, 8 July 2020 (UTC)

I'm really bad with walls of text, so I have only read a tiny portion of the above discussion. But I want to mention a couple of things that I think are worth considering in this discussion.
  • Most of the time, a mainspace "work" that is only a table of contents, but which has none of the actual content, and is not actively being worked on, can be (and should be) deleted as No meaningful content or history under our deletion policy.
  • A mainspace work that has only a little bit of content, but that content is a work unto itself within the scope of Wikisourse, should be kept. Most periodicals are like this. For an example, see the Journal of English and Germanic Philology which only has one hosted article, but that hosted article is scan-backed and firmly within scope.
  • On some occasions, empty mainspace works do have value. I ended up creating the page The Roman Breviary, depsite containing no actual content, mostly because there are a lot of works that link to it, using many different titles, and if someone uploaded a copy of the work under one title then many of the links would remain red because they point to different titles of the work. This could be easily solved by creating redirects to a simple placeholder page, so I did. I tried to make the placeholder page as useful as a placeholder page can be, as it contains useful information about the history and authorship of the work, and links to the Index pages where the transcription will take place.

Anyway those are my 2 cents, sorry if they are redundant —Beleg Tâl (talk) 00:40, 29 July 2020 (UTC)

Proposal

Since there has been no extra input for a month, and not wanting this section to get archived without at least attempting a proposal, I have started a proposal #Collective work inclusion criteria above. Inductiveloadtalk/contribs 11:00, 25 August 2020 (UTC)

Since the proposal has now slipped off the main page (to here), with vague support for the first part (collective work inclusion criteria) and a fairly consistent opposition to the second (no-content pages), my plan is to transfer the first part, as guidelines rather than policy, to Wikisource:Periodical guidelines. As non-binding guidelines, they can then be worked on further in situ. Sound OK? Inductiveloadtalk/contribs 08:10, 16 April 2021 (UTC)
The example given in Wikisource:Periodical guidelines might be improved, PSM is and was an exercise that has gone its own way (no offense to @Ineuw:, this is a site under development and that is only one example).CYGNIS INSIGNIS 13:05, 17 April 2021 (UTC)
@Cygnis insignis: You would be wrong to think that I am offended. Remember that when I started, I knew everything. By now, so much of that knowledge is lost that I am happy to listen. Would you elaborate please? — Ineuw (talk) 19:50, 17 April 2021 (UTC)

I've created Bradshaw's Monthly Railway and Steam Navigation Guide (XVI) - it couldn't be done on one page, due to the very high number of template transclusions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 1 September 2020 (UTC)

@Pigsonthewing: The links in the toc on that page appear non-functional. Also, depending on just exactly which templates were the culprit, it is possible that you may be able to put all the content you wanted onto one page now due to some recent technical changes (template code moved to a Lua module which drastically improves performance and prevents hitting transclusion limits until much later). Xover (talk) 11:17, 14 September 2021 (UTC)
Create the Draft namespace to hold substantially empty works? Then delete if no improvement after months?--Jusjih (talk) 19:22, 1 November 2021 (UTC)
The issue is that the "substantially empty works" can have useful and complete content that stands alone. For example, an article from a scientific journal.
I would not want to see that either shunted into a Draft namespace to rot or deleted a few weeks down the line.
Index and Page namespaces provide our long term staging areas, and works can and do remain unfinished there for years. But what do we do when a self-contained piece of a larger work is ready? Inductiveloadtalk/contribs 20:29, 1 November 2021 (UTC)