Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Copyright and deletion discussions needing community input in December 2019

The following copyright discussions and proposed deletion discussions have been open for more than 14 days, and with more than 14 days since the last comments, without a clear consensus having emerged. This is typically (but not always) because the issue is not clear cut or revolves around either interpretation of policy, personal preference within the scope afforded by policy, or other judgement calls (possibly in the face of imperfect information). In order to resolve these discussions it would be valuable with wider input from the community.

Copyright discussions require some understanding of copyright and our copyright policy, but often the sticking points are not intricate questions of law so one need not be an intellectual property lawyer to provide valuable input (most actual copyright questions are clear cut, so it's usually not these that linger). For other discussions it is simply the low number of participants that makes determining a consensus challenging, and so any further input on the matter would be helpful. In some cases, even "I have no opinion on this matter" would be helpful in that it tells us that this is a question the community is comfortable letting the generally low number of participants in such discussions decide.


Copyright discussions


Proposed deletions


Note that while these are discussions that have lingered the longest without resolution, all discussions on these pages would benefit from wider input. Even if you just agree with everyone else on an obvious case, noting your agreement documents and makes obvious that fact in a way the absence of comments does not. The same reasoning applies for noting your dissent even if everyone else has voted otherwise: it is good to document that a decision was not unanimous.

In short, I encourage everyone to participate in these two venues! --Xover (talk) 06:46, 2 December 2019 (UTC)

16:58, 2 December 2019 (UTC)

Seemingly identical titles are not identical

From time to time I solve the same problem as seemingly identical titles are not identical, compare e.g. Bohemian legends and other poems/A Hussite Song⁠ with Bohemian legends and other poems/A Hussite Song. Some time ago I was told that it happens because of some invisible characters. However, as the characters are invisible, it is very annoying (and also difficult to find out which of the two titles is the wrong one). Is it possible to solve somehow so that it does not happen? --Jan Kameníček (talk) 10:47, 4 December 2019 (UTC)

I do not know what is possible and what is not, but it would be great if:
  1. preferebly the invisible characters could be ignored
  2. if not, if they could at least be made visible.
However, I do not know either, whether this can be achieved locally. --Jan Kameníček (talk) 14:17, 4 December 2019 (UTC)
@Jan.Kamenicek: In what situations or contexts do you run across these? There's no general way to ignore or make visible these characters, but there may be ways to help detecting or prevent them in specific contexts and situations. We also have some de facto policy that page names only use characters from an extended ASCII subset (it is sadly not an explicit written policy), which means we could conceivably have a bot create lists of pages with "illegal" characters that we could treat as a maintenance backlog to systematically (but not automatically) fix them. --Xover (talk) 14:30, 4 December 2019 (UTC)
@Xover: I am not sure, whether I can recall with certainty how it usually happens, but I think that it is in the following way: I have got some OCR text of a downloaded scan. Then I turn some expressions from this text into red links, and after clicking on the link I create a page. I usually do not notice anything suspicious until I make another link to the same page in a different way, e.g. manually, and the link turns red again, although the page has been founded. (This is usually only a matter of coincidence, because I often make the links by copying the title of the page, and in such a case the link works well as it was copied together with the invisible character, and so the problem probably stays unnoticed in some cases). --Jan Kameníček (talk) 14:43, 4 December 2019 (UTC)
@Jan.Kamenicek: Hmm. Well, the good news is that since you're (mainly) creating these yourself there's no need to solve this for everyone. The bad news is that there's no obvious and easy fix for it. The best I can come up with off the top of my head is a user script to sanitise links, but you'd need to remember to run it manually every time. Or maybe we could hook into the "Save" button so it checked every time you tried to save a page. --Xover (talk) 16:18, 4 December 2019 (UTC)
Well, I think that if such an invisible character gets into the title of a page from scanned text, it can happen to anyone, not only to me (I think I read somebody else complaining about it here as well some time ago). Something similar happens also in Commons, where they even run some bots to correct names of files or categories from time to time, but correcting the title days or weeks after its creation by bot is late if you need to work with it immediately. So I thought that some general solution like automatic removal of such characters could be found. However, the above mentioned solutions look too difficult :-( --Jan Kameníček (talk) 16:36, 4 December 2019 (UTC)
@Jan.Kamenicek, @Xover: I've suggested some new regexes to add at MediaWiki talk:Titleblacklist. These should help with the problem. Kaldari (talk) 16:47, 4 December 2019 (UTC)
@Kaldari: What is going to happen when somebody tries to create a page with such a blacklisted character? Will the character be removed and the page created without it, or will the page just be refused to be created? If the latter is true, will the contributors get some message with guidance why it was refused and what shall they do? --Jan Kameníček (talk) 18:28, 4 December 2019 (UTC)
@Jan.Kamenicek: The page would be refused and the editor would get an error message. It's actually possible to create custom error messages for each titleblacklist rule. Do you think that would be useful? Kaldari (talk) 19:08, 4 December 2019 (UTC)
The generic error message is: "You do not have permission to create this page, for the following reason: The title 'XXXXXX' has been banned from creation. It matches the following blacklist entry: XXXXXXX." Kaldari (talk) 19:10, 4 December 2019 (UTC)
@Kaldari: It would definitely be useful if the message were more specific and advised what to do, something like: "The proposed title of the page contains forbidden invisible characters, which might be a copied remnant of a scanning process. It is recommended to create the page again with manually typed title." (Or the message can be worded in a more comprehensible way than this attempt of mine.) --Jan Kameníček (talk) 19:47, 4 December 2019 (UTC)
@Jan.Kamenicek: Unfortunately, I don't have editinterface rights, so I can't create the custom error messages myself, but maybe Xover could. Kaldari (talk) 19:54, 4 December 2019 (UTC)
@Kaldari, @Jan.Kamenicek:   Done (diff). The custom error message is at MediaWiki:titleblacklist-invisible-characters-edit. It seems admins are exempted from the title blacklist so I haven't tested it. --Xover (talk) 09:02, 5 December 2019 (UTC)
@Xover: I have just tried it and managed to create a page with the forbidden characters :-( --Jan Kameníček (talk) 09:18, 5 December 2019 (UTC)
@Jan.Kamenicek: The WORD JOINER (U+20160) character was not included in the blacklist rules. I've added it and deleted the test page. See if you can recreate it now? --Xover (talk) 09:38, 5 December 2019 (UTC)
@Xover: Well done! Now it works as expected. Only the message with the code is very long and may be confusing to some (i. e.: "…and the blacklist rule that blocked it ( .*[\x{00A0}\x{1680}\x{180E}\x{2000}-\x{200B}\x{2028}\x{2029}\x{202F}\x{205F}\x{2060}\x{3000}].* <casesensitive|errmsg=titleblacklist-invisible-characters-edit> # Non-breaking and other unusual spaces), in order to be able to help. ). I suggest to leave the code out of the message. --Jan Kameníček (talk) 10:53, 5 December 2019 (UTC)
Although it's long, I think showing the specific code that blocked the title is important for troubleshooting. Kaldari (talk) 16:48, 5 December 2019 (UTC)

New Hampshire versions

we have three uploads to commons of the same internet archive book: Index:New Hampshire (Frost, 1923).djvu, Index:New Hampshire.pdf and File:New Hampshire by Robert Frost.djvu. we might want to adopt the former as the most progressed. and we might want to consider how we coordinate effort in public domain day uploads so as not to duplicate effort. Slowking4Rama's revenge 22:07, 5 December 2019 (UTC)

Macron combined with small caps

It seems that macron above a letter cannot be combined with the {{sc}} template and is pushed to the right, compare e.g. Theatrv̄ with Theatrv̄. It is not a big thing, but if there was some easy solution, it would be nice to solve it. --Jan Kameníček (talk) 11:31, 7 December 2019 (UTC)

@Jan.Kamenicek: This is a font + font renderer issue, probably caused by the lack of a precomposed code point for LATIN SMALL LETTER V WITH MACRON in Unicode (it's actually a LATIN SMALL LETTER V (U+0076) + COMBINING MACRON (U+0304). Your example renders fine in Safari/Chrome/Firefox on macOS. All {{sc}} does is wrap the text in a span: <span style="font-variant:small-caps">{{{1}}}</span>. --Xover (talk) 12:06, 7 December 2019 (UTC)

2020 Scanapalooza

As with last year, we will have many significant works entering public domain in the US in 2020.

An organizational page for identifying and tracking these works now exists at Wikisource:Requested texts/1924. --EncycloPetey (talk) 20:23, 7 December 2019 (UTC)

good work, see also https://everybodyslibraries.com/ where there are some suggestions about PD-not renewed works as well. -- Slowking4Rama's revenge 02:05, 8 December 2019 (UTC)

16:35, 9 December 2019 (UTC)

Google and Phe's OCR

I've been following the posts about Phe's OCR issue on Phabricator, while humming Sam Cooke's "A change is gonna come".

Google OCR is excellent at recognizing the accented text, but it makes a mess of paragraphs by loosing words, which sometimes end up at the bottom of the page, if at all.

Do I have any other options? — Ineuw (talk) 01:46, 4 December 2019 (UTC)

i have been known to copy paste the text from Internet Archive text version, when both OCRs fail, but it is slow. i.e. [5] for Index:Proceedings of the Royal Society of London Vol 1.djvu -- Slowking4Rama's revenge 03:02, 4 December 2019 (UTC)
Thanks for the reminder. I have the text layers of all of my projects, but after going through them it seemed to me double work, although a lot of errors can be fixed at once throughout one document with search and replace. — Ineuw (talk) 10:00, 4 December 2019 (UTC)
@Ineuw: Some time ago somebody advised me to copy mw.loader.load( '//wikisource.org/w/index.php?title=User:Putnik/TesseractOCR.js&action=raw&ctype=text/javascript' ) into my common.js. It creates another OCR button, and although it is very slow in comparison with Phe's tool, it is better than nothing. I usually proofread several pages at the same time: while I am working on one of them, the others are being processed. --Jan Kameníček (talk) 10:31, 4 December 2019 (UTC)
@Jan.Kamenicek: Thank you x 3. I had the same experience with Putnik's OCR, and abandoned it because it was so slow. Several pages at a time is an excellent idea. — Ineuw (talk) 19:48, 4 December 2019 (UTC)
I'm not sure if this is relevant here (or if it's already known by experienced Wikisource users), but I've recently figured out a really useful process for quickly getting texts into Wikisource from IA, in a form that requires less editing. It involves copying the "Full text" from the IA page (e.g. here), which generally has line breaks in a position that preserves paragraph breaks (two line breaks per paragraph), unlike the OCR layer. After I've copied that, I also use RegEx (either in a desktop text editor, or using the built-in Wikisource search-and-replace feature), to remove page headings (more or less) and take care of any other general tasks that the particular work in question requires. After doing that "rough cut" of tidying up the text, I use the Help:Match and split tool, creating individual pages for the work that are substantially better than the OCR layer. For an example of a work I've done this with, but I haven't yet done much proofreading for, see here. -Pete (talk) 23:59, 4 December 2019 (UTC)
Always download and keep a copy of the IA text of my projects, but found that cleaning up that sea of text is very daunting. I returned to Google OCR because it recreates the accented Spanish and French text with 99% accuracy. So, for this kind of work it is still the best, even though it pastes some text at the end of the page. Once I got used to it, I don't mind because this behaviour is predictable. — Ineuw (talk) 01:28, 12 December 2019 (UTC)

Main namespace works; portal works and tendency to encyclopaedic components or listings

Putting forward a provocative point of view. [This is neither a holistic essay, nor a rant, it is a pointer to some issues, and some drift over time]. It was the agreed/implicit/practice/working place years ago, and if we changing these perspectives then let us have that discussion. If we have exceptions, then let us discuss them, and note them on their respective talk pages. Otherwise, this will serve as a basis for some of my current maintenance over the next month. (Not blaming, nor finger-pointing, as this is often a grey line, and we have all probably stood astride that line.)

We seem to have some confusion about pages in main namespace

  • as a place for reproduction of works
     Y purpose of main namespace
  • as a listing for works that we have no obvious or immediate place for works
    • volumes of works
    • series of works
     N both of these belong in portal namespace, (hint) if you are using {{ext scan link}} then that is an indicator of portal: or author: ns pages
  • pseudo encyclopaedic article with elements of above
     N portal: ns and/or wikipedia


Contents of main namespace (may not be comprehensive, so consider informative, nor normative)

  • Works (transcribed or transcluded)
  • Disambiguation pages—where listing main namespace works, and other pages as required to disambiguate
  • Version pages of works—1 per title of work by author/body; at root level, never as a subpage
    if explanation is needed at a work level—eg. a work has multiple works of same name—then it should be in the notes, we shouldn't create a versions/disambiguation page subsidiary to a work
  • special cases
    • main page
    • sandbox redirect


For us a work is an edition, a static publication. If it is dynamic work it should be consider as either numerous static editions, or it is not part of our scope WS:WWI.

Some examples of "problematic" things that we get into

  • Hong Kong Fact Sheets
    This is a series of individual publications that are updated at various times; or it is a couple of sets of updated publications; or it is dynamic work
  • Traffic Signs Manual
    This is either several publications or versions. It is not what it purports, and is not main namespace.

And yes, I do get into arguments with people who come and create main namespace pages pasting a wad of links to external scans. Usually these re in the hope that someone else will come and do the work. These are constructs and are portals.

General guidance: Help:Namespaces

billinghurst sDrewth 02:16, 10 December 2019 (UTC)

If Weird Tales/Volume 1/Issue 1/The Thing of a Thousand Shapes exists, and does Weird Tales/Volume 1/Issue 1, so should Weird Tales.--Prosfilaes (talk) 11:22, 11 December 2019 (UTC)

Newspapers

Question: What do you think about pages such as The New York Times, which are in the Main namespace? Typically we have hosted the main page for periodicals in the Main namespace, with links to the individual volumes. --EncycloPetey (talk) 02:51, 10 December 2019 (UTC)

to question. That NYT page "as is" is portal namespace page, though has mongrel mix of both components. The root NYT page is an entry point to the transcribed/transcluded content. Newspapers are tricky as we are never going to have all their content EVER. So we just can set them up to link to our content in the main ns. The portal: ns allows all the other flexibility, and was why we set up and re-utilised the portals with that flexibility. — billinghurst sDrewth 03:03, 10 December 2019 (UTC)
That seems to be an arbitrary distinction. There are collected volumes published onf periodicals and newspapers. And any newspaper is itself a collection of disparate articles that happen to be published together. I can't see that route ever working, as we would never have uniformity of pagename structure. --EncycloPetey (talk) 16:01, 10 December 2019 (UTC)
Yes, and? We have articles in main namespace, and a structure to present those articles that we have, and one that allows for the addition of further works. Why would we building an empty red-linked framework and with links to external links in the main namespace. My comment is that curated lists of external links go to Portal, such the pair of Portal:New York Times for curated lists and New York Times for our reproduced works, cut out the guff in main. — billinghurst sDrewth 23:04, 10 December 2019 (UTC)
OK, "curated lists of external links go to Portal" is a principle we can discuss. I won't post an opinion yet, but that is at least a tangible starting point. --EncycloPetey (talk) 23:12, 10 December 2019 (UTC)

Series

Volumes / Series - the distinction between these two can be fuzzy.

  • The Loeb Classical Library has characteristics of both. There is a uniform format across the volumes, and a common series title, but individual volumes or groups of volumes are by different authors and editors. We handle this collection as a set of volumes, mostly perhaps, because the volumes do have numbers, although the numbers do not always follow a logical sequence.
  • The Portal:Yale Shakespeare is also uniform across the volumes, has a common series title, and although individual volumes have different editors, there is a single author. We handle this collection as a series, in part because the volumes are not numbered.
  • The Ancient Classics for English Readers is a series of uniform format, with different authors for most volumes. We handle this collection as a series, even though the volumes do have numbers, but these numbers are only visible in the printer's marks at the bottom of certain pages.

I think each of the examples I've listed above is appropriately handled, but together they serve to show that the dividing line between "volumes" and "series" is neither obvious nor clear. The only real difference here is that volumes are explicitly and clearly numbered in one case and not the others.

(opinion) "Loeb Classical Library" belongs in portal namespace, it is not a work and each of the works each appear as root level work. The series would have its own WD component for links to articles, etc. and that better helps manage the metadata
Re: Loeb opinion: I don't buy the Wikidata argument. Over there we have separate data items for individual chapters of some novels by Dickens and Melville. But how is the Loeb Classical Library materially different from Sacred Books of the East, which is also a collection of many separate works by different authors and translators, each of which merits a Wikidata page? Yet there is no question of hosting that set of volumes in the Main namespace? --EncycloPetey (talk) 03:10, 10 December 2019 (UTC)
"What about the Golden Book series?" types of an argument doesn't help, so I don't think that the work-to-work comparative route is the best route. Can we start with what are our principles for siting our reproduction, and then we align from there.

To me a series page belong in portals, if a volume is part of a series, and exists in our space, then we can throw in a redirect. That an editor reinvigorates a work and includes it in their series is neither here nor there, it is an edition of a work unless you can find some other reason to place it as a subpage, ie. first time brought into publishing world in a language, or the works translators and some other special curatorial aspects. — billinghurst sDrewth 03:27, 10 December 2019 (UTC)

But that's exactly the problem. I don't see any principles or rationale for when a Mainspace work should list volumes and when the list should be located outside the Main namespace. There are no principles I can think of that would be applicable as a generality. We're thus left with deciding every single case one by one, in the absence of any guiding principles. --EncycloPetey (talk) 03:37, 10 December 2019 (UTC)
Then don't look at the "volumes" it just a production model. A series is a series. A work is a work, whether it is published in one volume or multiple—knowing that some were volumes only due to the subscription model, or size of a work and binding issues for the size of page. — billinghurst sDrewth 03:42, 10 December 2019 (UTC)
That doesn't help clarify the distinction. You've declared that there is a difference, but haven't provided any criteria for making the distinction, nor for deciding whether or not to use a Mainspace page. My point is that, without such criteria, we have no objective way of deciding which items should be in the Mainspace and which should be elsewhere. We have, for example Wuthering Heights (1st edition), whose title page declares that it is a novel in three volumes, but the third volume is in fact a different work by a different author. The Masterpieces of Greek Literature (1902) is a collection of many works, but they happen to have been bound together and published in a single volume. The World's Famous Orations is likewise a collection of works, but with a narrower focus and over several volumes, some of which are parts of a sub-collection: Volume I is Greece, but Volumes 8 to 10 constitute a single collection for America. You have provided no criteria for deciding which of these items warrant a Mainspace page and which do not. All you have said is that we shouldn't look at the volumes. --EncycloPetey (talk) 04:01, 10 December 2019 (UTC)
Wrong focus. The crosses are on the first level, not the second level which are the example of the first level. Please re-read in that context. My argument is not based on the word volumes but on the and its presentation of a reproduced work. — billinghurst sDrewth 08:51, 10 December 2019 (UTC)
Sorry, but if you were trying to clarify something, I don't understand at all now. Your last sentence in particular seems to be missing words. --EncycloPetey (talk) 15:58, 10 December 2019 (UTC)
I'd think any of these series could be stored in mainspace. Portals should be our creation, not someone else's.--Prosfilaes (talk) 11:22, 11 December 2019 (UTC)

Template Bloat...

Page:Hill's manual of social and business forms.djvu/108 vs Page:Hill's manual of social and business forms.djvu/120

The former doesn't quite match up to the Original, but is considerably less template calls... 1 or 2 per page vs 30.

Can someone with coding expertise look into this in more depth, so that ALL the old template bloat across English Wikisource can be eventually removed.?

Common formatting in Long lists is an area where something like this kind of template bloat is likely to occur on a frequent basis. ShakespeareFan00 (talk) 15:05, 9 December 2019 (UTC)

@ShakespeareFan00: This isn't the kind of template bloat I'd worry overmuch about unless and until it causes actual problems. Use the {{sc}} shortcut to cut down on visual noise in the page source, and don't worry about it. Excepting where there are available and well supported CSS pseudo-classes/-elements to use, we'll have to apply containing elements to mark this kind of thing, and all {{sc}} does is spit out <span style="font-variant:small-caps">{{{1}}}</span>. That's about as lean as can be expected (well, except we should migrate it to use TemplateStyles, but then you'd have the class attribute instead of the style attribute for only minimal savings).
However you're right that we have a lot of use cases that could benefit from a more CSS-based approach and we should do more to identify, organize, and implement/migrate to these. I think, though, that that would best be done by starting to collect various such cases on a page somewhere rather than attacking individual works or templates as they come up. Not least because the piecemeal approach tends to lead to ridiculous results such as {{float left}} and {{float right}} taking different parameters (for the offset, left is unitless, right needs the unit specified).
Incidentally, I also think we could beneficially have a manual of style that describes common scenarios where reproducing the original faithfully is too much effort / too template intensive / has other drawbacks, and what are acceptable fallback approaches for these. I'm not sure using a definition list for this particular use case is necessarily the best solution, but the slightly inaccurate rendering it produces is within the bounds of the sort of thing I would consider acceptable for such documented tradeoffs. --Xover (talk) 17:50, 9 December 2019 (UTC)
Well one for your list of too difficult's, to start things is sidetitles (i.e what cl-act-p and family was trying to do), which are a pain to do in HTML/CSS. Not even the legislation.gov.uk site attempts to do an exact print reproduction in respect of sidetitles on actual legislation. ShakespeareFan00 (talk) 18:02, 9 December 2019 (UTC)
Another bloat reduction approach is table classes... (see Template:Table class and sub-pages) This saves having to make extensive calls to {{ts}} which in itself could be made more efficient. ShakespeareFan00 (talk) 18:02, 9 December 2019 (UTC)
{{sc}} is a redirect, That doubles the number of template calls for a start... ShakespeareFan00 (talk) 18:04, 9 December 2019 (UTC)
you could start with some consensus standards for tables of contents, and indexes, and then propagate at help. and then put on a maintenance category to migrate. if we had a toc helper to a simple table, that would be a big help. Slowking4Rama's revenge 23:02, 9 December 2019 (UTC)
It would be easier to read the source for tables of contents if it was possible to set the parameters for a string of rows separately from the contents of the rows. I.E.:
TOC begin|dotted=yes|parameter2|etc.
TOC line |chapternum|title|pagenum|col4
TOC line |chapternum|title|pagenum|col4
TOC end
And it should be possible to put multiple of these one after another with no gap between so that rows with different parameters could be combined into one table. Does that make sense and is it workable? Levana Taylor (talk) 12:25, 11 December 2019 (UTC)
That approach should be what any {{TOCstyle}} revamp does. Currently it generates single row tables inside list items, and a LOT of repetive inline CSS which is not efficient coding. (If leaders are used, it rapidly blows mediawiki limits, see comments elsewhere for other contributors views on these.) Using certain features of CSS and TemplateStyles, bloat in TOC style markup could be reduced at the expense of some cosmetic functionality.) ShakespeareFan00 (talk) 20:17, 11 December 2019 (UTC)
Why would you lose any cosmetic functionality? There aren’t any inline properties that can't be applied as a class, are there? Currently each line not only can be formatted individually but must be, which is nuts. The way I envision it, if a line is different from the others, you’d just have to enclose it in a separate higher-level TOC template, and the TOC template would have to be written so that several of them can be combined seamlessly into one table -- {{TOC line}} does that fine now. Levana Taylor (talk) 19:44, 12 December 2019 (UTC)
In {{TOCstyle}} , there are options for leadersym, to set the symbol used to add the leader, This is the cosmetic function that would be lost by

moving the leader behaviour to a stylesheet. See {{TOCstyle/sandbox}} and [[:Module:TOCstyle/experimental.

Doing at as one template per line Template:TOCline calls, 'blows through' the template transclusion limits if the TOC is very large.

I don't disagree with TOCstyle being re-written to generate a table instead of a list though, as that could remove a level of complexity if done carefully. ShakespeareFan00 (talk) 23:19, 12 December 2019 (UTC)

History of the United States of America, Spencer, v1

The following discussion is closed:

Done.

The project Index:History of the United States of America, Spencer, v1.djvu contains two duplicate pages of p404 and p405 where p402 and p403 should be.

See: here and here

I have located what appears to be the correct replacement pages for these from a slightly later edition of the work at the internet archive. See: p402 p403

Would somebody be able to replace these two pages for me? I will proofread them once done.

Thanks Sp1nd01 (talk) 21:29, 11 December 2019 (UTC)

@Sp1nd01: Done. Please let me know if you need any old Page: pages deleted or moved without leaving a redirect etc. --Xover (talk) 07:39, 12 December 2019 (UTC)
@Xover:, Thank you for the quick fix. All appears to be in order, as far as I can tell nothing further needs deleting or moving. Sp1nd01 (talk) 12:58, 12 December 2019 (UTC)
This section was archived on a request by: Xover (talk) 15:16, 12 December 2019 (UTC)

Cached links table not updating...

Category:Pages using a custom leader in TOC or Index content, I updated the tracking categorisation, but it isn't propogating when the template was saved. I'm having to MANULLY NULL EDIT, the category contents page by page which is tedious, time-consuming and uncessary, if someone sat down and fixed the actual "problem" with the back-end. ShakespeareFan00 (talk) 12:03, 10 December 2019 (UTC)

It will happen over time as resources are low enough. Is it truly necessary to spend time pushing this to occur? What is the value in actioning. — billinghurst sDrewth 12:30, 10 December 2019 (UTC)
It's not urgent. I was updating the categorisation to determine how much would break if an alternative approach to TOCstyle was implemented, namely in moving what's currently done as inline CSS, to a dedicated stylesheet (assuming TemplateStyles can be used from Modules), that need only be included once per Template/Module, invocation, or even once per page. Inserting 384 dots directly into the output seems to be the WRONG way of doing it, when for the default case, those dots could be inserted using a ::before<nowiki> or <nowiki>::after pusedo element in the stylsheet <span class=dot-leader> is considerably less than 384 characters, which would obviously be a reduction in duplicated HTML generated, meaning that internal limits would be less likely to be encountered possibly.

ShakespeareFan00 (talk) 14:56, 10 December 2019 (UTC)

Who says we need to do any addition of leaders? Do we need to rehash a philosophical discussion about slavish replication of 19thC fixed dimensions paper book world, and that replication to 21st web world of css on wide screen computers or narrow screen devices.

Now when some users set a ToC to pretty much full width, then I would agree that a leaders can be useful, though I would think that the better conversation is whether we should restrain the width of a ToC, not the use or not use of leaders because that book edition had them. I would suggest that a ToC and an Index are components that should and can be representative, not facsimile.

To remember that the ToC and the index are generally not the works of the author, and instead an artefact of the publishing. Are we getting the words of the author right? Are we truthfully (not slavishly) reproducing an edition work so it can be understood as the output of the author? — billinghurst sDrewth 22:54, 10 December 2019 (UTC)

For what it's worth I updated some of the TOCstyle code in a fork and got a reduction in generated code by a third. A redesign of the layout structures used in TOCstyle would give further reductions (irrespective of the use of leaders). ShakespeareFan00 (talk) 09:39, 13 December 2019 (UTC)

How often does the database for searches actualyl update?

https://en.wikisource.org/w/index.php?title=Special:Search&limit=500&offset=0&ns0=1&ns1=1&ns2=1&ns3=1&ns4=1&ns5=1&ns6=1&ns7=1&ns8=1&ns9=1&ns10=1&ns11=1&ns12=1&ns13=1&ns14=1&ns15=1&ns100=1&ns101=1&ns102=1&ns103=1&ns104=1&ns105=1&ns106=1&ns107=1&ns114=1&ns115=1&ns828=1&ns829=1&ns2300=1&ns2301=1&ns2302=1&ns2303=1&sort=create_timestamp_desc&search=insource%3A%2Fclearfix%2F&advancedSearch-current={}

I can edit, update the search and find that entries I'd ALREADY resolved still appear in the Search Results. I assume the searches are being cached somehow? ShakespeareFan00 (talk) 18:48, 13 December 2019 (UTC)

Database updates with edits … mw:help:CirrusSearchbillinghurst sDrewth 11:16, 15 December 2019 (UTC)

FYI: Moved author pages not updating at Wikidata

On moving author pages in the past few days, the links are not being updated at Wikidata. I have flagged the issue at Wikidata chat, and am seeking their guidance on how to progress the matter, especially not knowing which part of the system is at fault.

I have no idea whether it is more than author pages, or not. I don't see that it is main namespace pages, though I haven't dug through sufficiently to check, though nothing evident in the tracking category. — billinghurst sDrewth 10:25, 15 December 2019 (UTC)

The files in this category have not yet been pagelisted, and are also in instances lacking authorship information to determine thier status. It would be appreciated if other contributors in a position to provide additional information could do so. ShakespeareFan00 (talk) 10:05, 14 December 2019 (UTC)

Talk to the uploaders, or work with them to get it done. Not certain that we should be imposing this upon users here. We all know the place exists, and there are thousands of works that need more work than these. All maintenance is important, and we all impose it upon others, and choose to do what we wish to when we can. — billinghurst sDrewth 11:14, 15 December 2019 (UTC)
unclear to me what you want to "check" with Index:KJV 1778 Oxford Edition.pdf ? do you want better metadata? or a completed index pagination? a quality improvement team to triage and improve indexes would be a better approach, than an ad hoc "look as this" approach, Slowking4Rama's revenge 14:57, 16 December 2019 (UTC)
Ideally both improved meta-data and fully completed page-lists would be desirable, There are only 5 remaining items in this category. :) ShakespeareFan00 (talk) 17:27, 16 December 2019 (UTC)

00:15, 17 December 2019 (UTC)

Problematic redirects (versions in a page with other works) and WD

The following set of triplet links are local redirect · wikidata link for redirect · redirect target

The issue is that redirects should not have wikidata items, especially when they become a part of a page that could have its own item.

Paths to resolution

  1. do nothing locally; at WD delete the interwiki (redirect) links, request WD item deletion; or
  2. locally convert to a versions page as we know that these are all reproductions, not originals; at WD make the pages work pages (merge items if work page already exists); or
  3. make a specific edition page (new concept) with the single link to the edition page; at WD we make the link an edition page

These items, themselves, are empty of description at WD bar the wikilink.

Of these, I prefer option 2. It gives the best linkage and visibility, and sets it within any broader context. — billinghurst sDrewth 11:10, 15 December 2019 (UTC)

Wikidata does allow items to link to redirect, as far as I know, but I do think that it is preferable to have it link to a versions page with only one single version (option #2) than any of the other suggested alternatives. —Beleg Tâl (talk) 15:28, 17 December 2019 (UTC)
As far as I can tell, this happened because we had items at those locations, then Wikidata items were created for them en masse using a bot, then someone here cleaned up the local item by redirecting the page to a scan-backed copy. The result is that WD now has a data item for the redirect. In the case of at least the first two items, we would want a WD item eventually, since both are self-contained compositions that might wish to be referenced. One option is to create a version page. Another is to back-create an edition using section editing. That is, insert sections into the main work so that the individual portion can be transcluded in isolation, with bibliographic references. --EncycloPetey (talk) 17:40, 17 December 2019 (UTC)

Match and Split (Phebot) not working

Match and split is a really useful tool, especially when used in conjunction with OCR text which has already been somewhat cleaned up. But the bot that drives it has been offline for a couple days. I see that User:Phe has not been active here on English Wikisource for many months. Does anybody know a good way to get this bot (or another process that performs the same functions) back online? -Pete (talk) 23:57, 16 December 2019 (UTC)

You can try Phabricator, but having the experience of fruitless begging for fixing the Phe’s OCR tool there for many months, I do not see your chances very promising. --Jan Kameníček (talk) 00:25, 17 December 2019 (UTC)
Thank you Jan, I'm happy to create a phab ticket, but I'd like clarity on a couple points first, if you or anybody is able to provide it. (1) Does anybody know whether Phe's codee is available anywhere public? (2) Does anybody know whether the specifications for that code were articulated or discussed prior to its being created? (3) Is this the kind of code that can, or should, be run on the wmflabs site? Apart from getting the code itself, what obstacles to that might exist? -Pete (talk) 18:39, 17 December 2019 (UTC)
@Peteforsyth: The code for all Phe's tools are available at https://github.com/phil-el/phetools, and all the tools run at the Toolserver here: https://tools.wmflabs.org/phetools/.
The problem here is that they are maintained only by Phe (who is a volunteer and cannot be expected to be at our disposal in any given timeframe, or at all for that matter), so nobody else can do much about problems with them, and Phe has been unavailable since this summer. If anybody wants to fork the code and set up an alternate tool then that is absolutely possible, but these are not particularly simple tools so it will take a commensurate amount of skill; a not insignificant investment of time to understand the code and get the alternate tools running; and a ditto time commitment to maintaining them over the long haul (otherwise we'll just be back here in three months).
I'd be happy to help however I can if anyone wants to make the attempt, but I'm allergic to PHP and Python (which are the main languages used in phe-tools) and cannot commit to any predictable amount of time. --Xover (talk) 19:06, 17 December 2019 (UTC)
Thank you Xover, very helpful. I am of course very cognizant that nobody is obligated to do anything here -- at this point I'm just wanting to help document what would be desirable to get done. This helps a great deal, I will open a phabricator ticket. -Pete (talk) 19:10, 17 December 2019 (UTC)
One further question, then -- I notice that the robot is now running (yay!) Is it because it lives on the toolserver, that it's possible for it to be started without Phe's intervention? (I had previously thought this was a bot that ran on Phe's own computer.) When the robot goes down like it did a few days ago, what is it that needs to happen (in its current state) for it to be restarted -- and what is the correct process for requesting that thing to happen? -Pete (talk) 19:15, 17 December 2019 (UTC)
@Peteforsyth: The Toolserver is a relatively complex beast, so reducing it to simple statements are going to misleading. Nobody but the maintainer can normally restart a tool, unless there is a security or infrastructure stability issue that requires intervention by "sysadmin"-type people. However, the various actual servers that make up the service we call "Toolserver" are sometimes rebooted for other reasons (which has the side-effect of restarting the tools running there), or one of them crashes (which is fixed by rebooting it, which has as a side-effect… etc.), or …
In this particular case (Match & Split down) it seems likely that the issue was a transient one with some component in the Toolserver service, and that the tool was not restarted as such: whatever happened in the infrastructure had as a side-effect that Match & Split started working again.
Which, by the way, is bad news for the OCR problem: if it was something a restart would fix, it is likely it would have resolved itself by now because the hosts get rebooted periodically for other reasons. That it still fails suggests that actual changes to the tool are needed, and that's something that requires an active maintainer with available cycles.
For Toolserver tools in general, the way to get the tool restarted is to contact the maintainer. They're the only ones with the access rights to do so. --Xover (talk) 07:29, 18 December 2019 (UTC)
For the record, it looks like Tpt is also a maintainer of Phetools, which is good news since Tpt is still quite active on Wikisource. If either Phe or Tpt want to add me as a maintainer as well, I would be happy to help keep an eye on it. I'm a secondary maintainer on lots of random Tool Forge tools. Kaldari (talk) 22:36, 18 December 2019 (UTC)
@Kaldari: That'd be great! But note that 1) I suspect tpt may have pings disabled, and 2) they may have been added as a maintainer on the same terms as you're volunteering for. Phe has not edited any Wikimedia project or had any activity on Github for 6+ months now (frankly I'm a little worried about them), and without their explicit consent tpt may not be comfortable adding you on their own cognisance. --Xover (talk) 07:22, 19 December 2019 (UTC)

Are featured texts also validated texts, or does featured supersede validated?

I unsuccessfully tried to get some input on this question in the KaldariBot discussion above, but didn't get any strong opinions. My question is:
Should "featured" text status replace the "validated" status or exist along-side it? In other words, should a featured text be marked as both "featured" and "validated", or is it just "featured" (which implies that it is also validated, proofread, etc.)?
Any feedback on this questions would be appreciated. Kaldari (talk) 22:25, 18 December 2019 (UTC)

A text cannot be featured unless it is fully validated. However, "featured" is not a proofreading status step for an Index, so it exists alongside. Beeswaxcandle (talk) 06:03, 20 December 2019 (UTC)
I agree with Beeswaxcandle. The text progress statuses and the featured status are orthogonal: while all featured text are by definition validated, semantically the two mean different things. I'll throw in the caveat that I don't comprehend Wikidata sufficiently to judge whether it might make sense to overload a single property rather than having separate properties; but in all other contexts (logical information model on down to database schema) I would have kept them separate. --Xover (talk) 06:49, 20 December 2019 (UTC)
The closest analogy to this WP's "good" and "Featured" status. Each of those is determined independently, however. An article can attain Featured status without attaining "Good" status; two separate procedures and evaluations are made to determine each status. So, if we decide to have both badges for Featured articles, this wouldn't look any different to what WP does. I say this only to note that no WP editors would think it odd that both badges existed on the same item, so we're not likely to confuse anyone if we double-badge our Featured articles. --EncycloPetey (talk) 18:49, 20 December 2019 (UTC)
Here are some examples of enWS pages which are featured but not validated:
So we cannot take it as given that all featured pages are validated. —Beleg Tâl (talk) 19:20, 20 December 2019 (UTC)
Thanks for the feedback, all! Sounds like the consensus is that "featured" and "validated" badges should exist independently of each other. Kaldari (talk) 20:07, 20 December 2019 (UTC)

New Wikisource users

Hello, all,

I organized a little "Wikisource party" earlier today for some WMF staff who were interested in how things work here. We had lots of questions, and you got several pages proofread as a result.

I also wanted to say that one of them had decided to try it out in advance, and he felt encouraged and reassured when someone thanked him for his first attempt. He's since proofread about another 40 or 50 pages, so it's working. ;-) Thanks for being such a friendly community. Whatamidoing (WMF) (talk) 23:47, 19 December 2019 (UTC)

Thanks for doing that, and for letting us know! You may already know this, but please let people know that WS:S/H is a great resource for asking questions. -Pete (talk) 23:49, 19 December 2019 (UTC)
Thanks so much for taking the initiative to do that. Very much appreciated! And please do let us know if we can assist in any way. --Xover (talk) 06:27, 20 December 2019 (UTC)

IA uploader cannot find an existing archive.org file

I would like to upload volume 31 of the National Geographic Magazine, see [10] . Originally, I downloaded the pdf file from HathiTrust, but my attempts to convert it into djvu using some online converters djvu failed, so I uploaded it to the Internet Archive and tried to upload it to Commons using IA uploader. However, the uploading process has already been ongoing for many hours and when I looked at the view log, there is written: "invalid ia identifier, I can't locate needed files", which is strange.

May I ask for help with converting and uploading the file? --Jan Kameníček (talk) 13:02, 21 December 2019 (UTC)

sometimes there is a lag as IA does its internal conversions. and then IA uploader goes slow as it converts to djvu. (and commons will not like PDM) i would retry, since this process is an open ticket. Slowking4Rama's revenge 16:48, 21 December 2019 (UTC)
File:The National Geographic Magazine Vol 31 1917.djvu. Mpaa (talk) 17:25, 21 December 2019 (UTC)
Perfect, thanks very much! --Jan Kameníček (talk) 21:20, 21 December 2019 (UTC)
Having looked at the uploaded file in detail, I can see that the converting process has diminished the quality considerably and the OCR layer was destroyed too, see for example [11] --Jan Kameníček (talk) 21:39, 21 December 2019 (UTC)
@Jan.Kamenicek:I think the reason is that IA has derived low quality jp2 images from the pdf file, and ran the process based on them. Also quality of IA pages displayed in their reader is poor. Mpaa (talk) 18:06, 22 December 2019 (UTC)
Sorry if this is a stupid question, but is there a reason you can't just use the PDF? BethNaught (talk) 21:58, 21 December 2019 (UTC)
@BethNaught: The most important reason is that Mediawiki has various problem with PDFs, the biggest of them being that it does not extract the original text layer of PDFs well (for detailed description of the problem see e. g. here). Conversion into djvu usually improves it. Besides that, the PDF file is over 100 MB, so it needs to be downsized in some way as Commons does not accept such large files. Conversion into djvu results in smaller size, another way would be keeping PDF but lowering its quality. Conversion into djvu usually solves most problems, but this time they seem enhanced instead :-( --Jan Kameníček (talk) 22:08, 21 December 2019 (UTC)
I had no problem uploading File:The National Geographic Magazine Vol 31 1917.pdf using the UploadWizard or ChunkedUpload (finishing as I type this). —Justin (koavf)TCM 22:29, 21 December 2019 (UTC)
@Koavf: Oh, that is wonderful, thanks very much! I did not even try it as Commons had always refused me when uploading over 100MB files, so something must have changed there. There is still the problem of text extraction from PDFs, but this time I will go with PDF, as it is (strangely) much better than DJVU, comparing e.g. [12] and [13]. --Jan Kameníček (talk) 23:00, 21 December 2019 (UTC)
No problem. That DJVU is pretty garbage. —Justin (koavf)TCM 23:40, 21 December 2019 (UTC)

Italian Wikisource

I was surprised to find an English-language text on Italian Wikisource: it:Scientia - Vol. VII/The origin and nature of comets; can it be imported here? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:41, 23 December 2019 (UTC)

Yes, it can. In effect, we would duplicate the Index page here as well as the relevant English language pages. I'm not sure whether anyone has developed a tool that would assist with doing this, as it's an uncommon occurrence. It would be tedious to do by hand, particularly if we're keeping the edit history intact to credit the work done at it.WS, but it could certainly be done. --EncycloPetey (talk) 17:22, 23 December 2019 (UTC)

20:03, 23 December 2019 (UTC)

Naming of governmental works and duplication resulting therefrom

I have just noticed White House memorandum of a telephone conversation between U.S. President Trump and Ukraine President Zelensky, July 25, 2019, which is based of a validated scan, is a duplicate of the earlier Memorandum of Telephone Conversation with President Zelenskyy of Ukraine. In addition, the earlier page was created by the same user (IP address) that created Letter to Chairman Burr and Chairman Schiff, August 12, 2019, which is the whistle-blower report. Neither of these names, and especially not the second, are a proper indicator of what the text is in actuality. These could be more readily accessed if names more properly indicative of their contents were given, and if they were connected to the relevant Wikipedia article(s). TE(æ)A,ea. (talk) 22:11, 24 December 2019 (UTC).

we have a style guide Wikisource:Style guide, if you want to add a section on work title, go for it. item find is pretty bad, but if you want to organized with a category or portal, or wikidata, go for it. -- Slowking4Rama's revenge 04:11, 25 December 2019 (UTC)
Doesn't seem to be style issue.

Naming of pages can be difficult especially where it is correspondence, see special:prefixindex/Letter from. Create redirects, put it into Wikidata, and link from the articles at enWP are all part of the process. If you think that a work should have a different/better name, then propose it and we can move it. We are not adverse to a diversity of opinion and a discussion. — billinghurst sDrewth 10:03, 25 December 2019 (UTC)

The following discussion is closed:

Resolved.

I have uploaded a corrected source file, but the file correction necessitates some page moves. I have made the more complicated moves, but would appreciate it if someone with a bot can complete the process.

Only pages in the (DjVu) range /48 to /216 need to be moved, and these pages need to be moved one page down. That is:

  • Page:Poetry of the Magyars.djvu/48 --> Page:Poetry of the Magyars.djvu/47
  • Page:Poetry of the Magyars.djvu/49 --> Page:Poetry of the Magyars.djvu/48
  • ...
  • Page:Poetry of the Magyars.djvu/215 --> Page:Poetry of the Magyars.djvu/214
  • Page:Poetry of the Magyars.djvu/216 --> Page:Poetry of the Magyars.djvu/215

All pages outside the range stated above already are in the correct location. I had to do some of those by hand because the original file was missing two pages, in addition to other problems, so not all pages needed to be moved one. --EncycloPetey (talk) 20:05, 26 December 2019 (UTC)

Done (in a minute). Mpaa (talk) 20:43, 26 December 2019 (UTC)
This section was archived on a request by: Xover (talk) 10:56, 29 December 2019 (UTC)

Page deletions

The following discussion is closed:

Resolved.

A quick speedy -Template:Ws diclist smallcaps.css , but I can't tag the page as such. This was created in error. ShakespeareFan00 (talk) 19:07, 9 December 2019 (UTC)

@ShakespeareFan00:   Done --Xover (talk) 19:30, 9 December 2019 (UTC)
This section was archived on a request by: Xover (talk) 10:59, 29 December 2019 (UTC)

Page numbers not displayed

The following discussion is closed:

Resolved.

Does anybody know why the page numbers of "Russian Government Links To And Contacts With The Trump Campaign" and of other subpages of the work are not displayed? --Jan Kameníček (talk) 23:26, 17 December 2019 (UTC)

It could be because some of the "page numbers" are 15 characters long or longer. Page numbers typically should be 4 or 5 characters max. --EncycloPetey (talk) 23:29, 17 December 2019 (UTC)
That was it. I reworked the pagelist and it helped. Thanks. --Jan Kameníček (talk) 00:23, 18 December 2019 (UTC)
This section was archived on a request by: Xover (talk) 11:00, 29 December 2019 (UTC)

Index:Old-folks.jpg

The following discussion is closed:

Resolved.

Just doing a random page validation and I find that I am unable to validate the Index:Old-folks.jpg page. I don't receive the validated button when attempting to save it. Other pages on other works do show the validated button. Is there a problem with this particular page? Sp1nd01 (talk) 10:20, 18 December 2019 (UTC)

@Sp1nd01: For some reason, a previous edit had managed to remove the username from the (invisible) noinclude section where the page status is stored. I've edited the page (set it to problematic and then back to proofread) so that my username got inserted there, so now you should be able to set it to validated. --Xover (talk) 11:04, 18 December 2019 (UTC)
@Xover: Thank you, that has now worked as expected. Sp1nd01 (talk) 13:31, 18 December 2019 (UTC)
This section was archived on a request by: Xover (talk) 11:01, 29 December 2019 (UTC)

Weird Tales vol. no. 1 scan

The following discussion is closed:

Resolved.

There are two scans, a .pdf and a .djvu, with the same naming scheme. Practice (for other scans) has been to use .djvu files, but there are generally more .pdf (non-.djvu) files available (if I am not mistaken). Could the proper scan be determined, and the data from the improper scan be transferred? TE(æ)A,ea. (talk) 19:48, 18 December 2019 (UTC).

DjVu scans are all-around easier to use on Wikisource. PDFs are easier to make, but have more technical problems. --EncycloPetey (talk) 19:50, 18 December 2019 (UTC)
The DjVu was a replacement for the PDF, and the pages have been moved to it.--Prosfilaes (talk) 06:25, 19 December 2019 (UTC)
This section was archived on a request by: Xover (talk) 11:02, 29 December 2019 (UTC)

Changes to Template:Header

To the template

some alterations
  1. for the parameter contributor there is now a synonym section_author
  2. for the parameter override_contributor there is now a synonym override_section_author

this was requested as there was the statement that "contributor" has some level of confusion. Whether we should migrate usage, and/or deprecate the term has not been discussed.

some additions
  1. the parameter section_translator, wikilinked
  2. the parameter override_section_translator, not wikilinked and takes formatting

This allows for the recording of translators of a subpart of a work, previously use of translator applied it to the section for the work, not the subsection.

The documentation has been updated

It is preferred that any discussion should be handled in a new section on this page, rather than as part of this announcement. Thanks. — billinghurst sDrewth 12:40, 30 December 2019 (UTC)

Empty categories

I would like to ask about the some currently empty categories (see below). Do they have any usage or can they be deleted?

  • Category:Pages containing image
    There is no explanation what the category is meant for. There are thousands pages containing images in Wikisource, but none of them has been categorized here so far. If there is reason for this category's existence, it should be easy to populate it by some bot. If there is not, I suggest to delete it.
  • Category:Pages containing errors‎
    The category description says: "These pages of non-fiction contain some error on them.", without specifying, what sort of errors is meant (spelling+grammar, factual errors by the author, factual errors caused by incomplete human knowledge in the time the work was written…) It is also not clear, how it should be populated (by SIC templates or manually?). I suggest to delete it.
  • Category:Texts without page numbers‎
    It is not clear which texts should come here: texts whose original publications are numbered but these numbers are not mirrored at Wikisource (which is true for most texts here which are not backed by scans) or texts whose original publications are not numbered? If there is a reason for existence of this category and after its aim is cleared, it can be at least partially populated by a bot. If there is not such a reason, I suggest to delete it too. --Jan Kameníček (talk) 22:34, 27 December 2019 (UTC)
see also User:Hesperian "Decline. IMO, in a community this small, nothing created in good faith by a regular should be speedily deleted. This should be taken to WS:PD" and User:Billinghurst, User:John Vandenberg, User:Cygnis insignis. -- Slowking4Rama's revenge 00:23, 30 December 2019 (UTC)
They are labelled maintenance/tracking categories so they will presumably have templates that will populate them when something is incorrect in their use. So suggest leave them as they are doing no harm, though I cannot remember what they do. Adding some commentary to them is probably of value. I you are seeing them where you should not be seeing them, then plug in {{maintenance category}}. (Hopefully we are better at labelling, and use of <includeonly> these days.) — billinghurst sDrewth 07:08, 30 December 2019 (UTC)
They might be supposed to be filled by some templates, but one of possible reasons why they are empty is that the templates do not exist anymore. If their purpose is worth to keep them there should be some way to find it out and add it into the categories’ talk pages or somewhere. --Jan Kameníček (talk) 19:19, 30 December 2019 (UTC)
Fully agree that overt labelling and documentation is the way to go. Doesn't change my initial comment. As a community we misused <includeonly> simply for the sake of neatness. — billinghurst sDrewth 01:27, 31 December 2019 (UTC)

Checking page style for court cases

I recently created a page for Valvoline Oil Co. v. Havoline Oil Co., using the existing page, Universal City Studios, Inc. v. Reimerdes, as my source for wiki formatting.

I would like to know if there is anything I should change in the style of any future works I may add; if there's any formatting I shouldn't have included or anything I left out.

Qwertygiy (talk) 19:27, 29 December 2019 (UTC)

A few items: (1) I see no link to the original source of the text copy. A link to the source of the text copy should appear in the header, or on the item's Talk page. (2) You've overlinked. there is no reason to link to Wikipedia articles like "Magazine", "Advertising", or "New York". (3) The judge who authored the decision should be identified in the header or header notes.
Also, you can center an image without using the template; I've done this for the two images. --EncycloPetey (talk) 19:43, 29 December 2019 (UTC)

  Comment @Qwertygiy: Agree with EncycloPetey's comments, see Wikisource:Wikilinks. Maximise internal links, minimal external links where adds true value and unambiguous. So we would do either author = or contributor = for whomever wrote a judgement or wrote an opinion. We would normally do local author links in the body of the work for the judges cited and create relevant author pages.

Some questions and comments

  1. Are the references yours, or where they in the original document? If yours, then they should be moved to the talk page, and use the edition parameter, and a note to point to them. We try to present clean documents, not annotations.
  2. We would normally add put the case into WD, if there is an article for the case at enWP, they can share the same item for case law, and this would provide the interwiki links.
  3. At some point we would/should create Portal:United States District Court for the Southern District of New York—their creation is organic, how many other works—sometimes even consider an anchored redirect a subsection to a parent portal page to make it easy to break it out at a later stage.

billinghurst sDrewth 08:54, 30 December 2019 (UTC)

  1. In regards to the references, they were all included verbatim in the source text (in parentheses) or were referencing earlier such citations as supra. Any citations that were integrated into the text rather than thusly isolated, I left in place and merely added links. My reasoning was that such a citation serves the same purpose whether in parenthesis or in footnote reference; the former was easier to create on a 1910s typewriter while the latter is easier to read on a 2010s webpage.
  2. In regards to Wikidata, I'll take a look at the procedures for that. I'm not very familiar with it yet, most of my contributions being solely at enWP.
  3. In regards to the portal, creating one that is a redirect to the subsection of the US case law portal seems like the best idea at the moment, since the half-dozen works added thus far seems a little too small and specific to justify having its own portal, but there are many thousands more that exist and just aren't added as of yet.
  4. In regards to the link to source, I left it in the original commit message; I'll add it to the talk page.
Qwertygiy (talk) 21:33, 30 December 2019 (UTC)

Cosmetic problem with {{header}} change

This conversation has been moved to Template talk:Header#title & contributor: one line or two? Levana Taylor (talk) 04:09, 31 December 2019 (UTC)

Versions and Wikidata problem

Version pages

In discussion with another editor, I've discovered that the information on Wikisource:Versions does not align with current practice.

Versions pages
Different versions of the same work are listed on "versions pages." Such pages are only for different versions of substantively the same work. Different works should not be listed together on the same versions page, even if they have the same title and/or author; they should be listed on disambiguation pages. This applies even to works that are reviews or analysis of the work. For example, Charles Lamb's prose retelling of Shakespeare's Romeo and Juliet (Shakespeare) is a version of that work, and belongs on a versions page with it. The entry entitled "Romeo and Juliet" in the The New Student's Reference Work is a work about, rather than a version of, Shakespeare's play, and therefore should not be included on a versions page. (Works that share the same title are listed on disambiguation pages; works that share the same subject are listed on portal pages.)

Wikisource:Versions#Versions pages


The key section is: "Charles Lamb's prose retelling of Shakespeare's Romeo and Juliet (Shakespeare) is a version of that work, and belongs on a versions page with it.

Does this mean that movie scripts, operas, retellings, children's adaptations, etc. belong intermixed on the same versions page? And who is considered the "Author" on such a versions page, when each item would actually have a different person who wrote it?

There is an additional problem now that we are connecting to Wikidata. Romeo and Juliet (Shakespeare) is linked to Wikidata item d:Q83186, which is specifically for the play written by William Shakespeare. The retelling by Charles and Mary Lamb has a separate data item, because the author and publication information are different. If we are to treat versions pages as currently described at Wikisource:Versions (quoted above), then we must remove the link from Wikidata, because our content on that page does not match the Wikidata item. We would need to create a new kind of page that lists only editions of the work itself, separate from the versions/retellings/adaptations.

The problem goes yet deeper. If you do not see the issue at play here, look at Macbeth (Shakespeare) and Macbeth. The page Macbeth (Shakespeare) currently lists only editions of the play itself, not the retelling by Charles and Mary Lamb, nor the opera adaptation by Verdi. The page is already crowded with editions, and there are many more besides that are not yet listed, because it is a Shakespeare play. The disambiguation page Macbeth lists the other items by other authors. And note that we currently have three editions of Charles and Mary Lamb's Tales from Shakspeare retellings, in various stages of transcription. Will all of these editions be listed on the same page as all the editions of the Shakespeare play? If so, all editions of Verdi's opera and of any other editions of any derivative works would also all appear mixed together on the same page. Is this desirable?

The current wording of Versions is no doubt the result of an earlier, simpler time when Wikisource did not have many editions of the same work, and did not have to concern itself with the possibility of multiple editions of the same work, nor multiple editions of derivative works. I propose we reverse the current wording so that Versions pages explicitly do not include adaptations or retellings by other authors. --EncycloPetey (talk) 18:43, 30 December 2019 (UTC)

I was thinking about the same problem when I was dealing with some folk tales that were retold by various authors.
The problem might be solved if we had two kinds of version pages: versions of work and versions of story. --Jan Kameníček (talk) 19:07, 30 December 2019 (UTC)
The Italian Wikisource has adopted a separate "Opera:" (Work:) namespace for items that are the same work, but different editions. We could do the same. Having two different kinds of Versions pages would get messy anyway, however we tried to do it. If we opened a new namespace for the items that are the same work/author, that would free up Versions pages to treat items that are the same general story, but with different authors/wording. --EncycloPetey (talk) 19:12, 30 December 2019 (UTC)
What about translations? Does it mean that the new namespace would also host original works and the translation pages would become redundant? --Jan Kameníček (talk) 19:26, 30 December 2019 (UTC)
The pages which list Translations could be rolled into the Work: namespace. They would merely need to accommodate the information about the original language title. But yes, if we decided to go that way, it would mean that we would not need a separate set of Translations pages. The only difference right now between a Versions page and a Translations page is whether or not the original language of the work was English. And we have some marginal cases already which sit astride the two, such as Beowulf, which was written in Old English, so its page lists the Old English editions as well as translations into Modern English. With a separate Work: namespace, we wouldn't have that problem. --EncycloPetey (talk) 19:31, 30 December 2019 (UTC)
I don't really see the problem. Yes, pages like that need to have additional internal structure. But those pages are far and few between. Of course the novel version of "And Then There Were None" and the play version should be on the same version page.--Prosfilaes (talk) 19:46, 30 December 2019 (UTC)
You haven't stated any reasoning, and no, they are not few and far between, and it is a growing problem. Why should works written by different authors appear as "versions" on the same Versions page, instead of on a disambiguation page? Why should a ten page prose summary of the story of Macbeth appear on the same page as a 150 page play with stage directions, when the two have different authors and completely different text? Why not group them instead on a disambiguation page? --EncycloPetey (talk) 20:11, 30 December 2019 (UTC)
Shakespeare's w:Romeo and Juliet (c. 1590–1595) is based on Arthur Brooke's 1562 narrative poem "The Tragical History of Romeus and Juliet" and William Painter's 1567 collection of Italian tales which included a version in prose named "The goodly History of the true and constant love of Romeo and Juliett"; Brooke's version was a translation into English of Pierre Boaistuau's 1559 French version; which was in turn a translation of Matteo Bandello's c. 1531–1545) Giuletta e Romeo; Bandello based his version on Luigi da Porto's c. 1524 Giulietta e Romeo; da Porto based his version on the c. 1476 Mariotto and Gianozza by Masuccio Salernitano, who draws on Dante's Divina Commedia (in canto six of Purgatorio), the Ephesiaca of Xenophon (c. 3rd century), and Pyramus and Thisbe from Ovid.
In the other direction there were the 16th/17th-century quarto and folio editions that may be said to be roughly the author's original editions; followed by something on the order of 25–30 main distinct editions up through the 19th century (starting with Nicholas Rowe, Alexander Pope, and Lewis Theobald in the first half of the 18th century; through the great Tonson editions edited by Samuel Johnson, George Steevens, and Isaac Reed; the 1790 and 1821 Malone editions; John Boydell's copiously illustrated edition; and up to the famous Cambridge/Globe and Arden editions). All of them aim to get at the "true" Shakespeare and seek to substitute their judgement for that of previous editors, leading to wildly differing results (not to mention Way Too Much Drama™ for historiography). And then we have things like Charles and Mary Lamb who retell the plays in prose at a level aimed at children, and Thomas Bowdler that expurgiated the plays to be fit "for 19th-century women and children". And in contemporary versions we have modern spelling editions, manga versions, etc.
And then you get into adaptations: w:List of films based on Romeo and Juliet list 150+ TV and movie adaptations alone. There are 8 ballets, 9 operas, 5 musicals, and 3 main compositions of classical music.
If we put everything on a versions page, Romeo and Juliet (Shakespeare) would have more than a thousand entries, more than half of them bearing little actual resemblance to the play that William Shakespeare wrote. We need to draw some lines somewhere: Shakespeare's works are extreme examples that help find those points in a way that Christie's paltry two versions do not. But the problem is general. --Xover (talk) 20:46, 30 December 2019 (UTC)
And the vast majority of works that have a version page will have two versions on there. Most books were never reprinted. Few were ever made into plays or came out in significantly different editions. A page for one of Shakespeare's plays can afford to be the exception. Moreover, "would have" is begging trouble from the future. Why not worry about what we have, instead of what we might have?--Prosfilaes (talk) 00:57, 31 December 2019 (UTC)
See Hymn and The Raven.--RaboKarbakian (talk) 20:35, 30 December 2019 (UTC)
Those are Disambiguation pages, which are a separate concern. They are not relevant to the current discussion. --EncycloPetey (talk) 20:48, 30 December 2019 (UTC)
There is a huge amount of variation in this kind of problem. In some cases it is pretty reasonable to consider the two works to be versions of the same work (e.g. the Hebrew and Greek versions of Esther). In some cases it's pretty reasonable to consider the two works to be completely different works sharing only a common underlying theme (e.g. the Routhier and Weir versions of O Canada, which should probably be converted to a disambig page). In the case of adaptations, for example La Fontaine's adaptation of The Tortoise and the Hare, or Seidenbusch's adaptation of Salve Regina, or the Lambs' prose adaptation of Macbeth, I would still consider them to be "versions" of the original work, and would list them on the original work's Versions page. However, I would note that they are adaptations rather than original versions (or direct translations), and generally would put them in a separate section of the Versions page. If the adaptation is significantly different from the original, I would also list it on a disambiguation page. —Beleg Tâl (talk) 20:51, 30 December 2019 (UTC)
(I might, however, give the adaptations their own versions page, and just link to them from the main versions page) —Beleg Tâl (talk) 20:54, 30 December 2019 (UTC)
Oh, here's a good example of what I mean: Alice's Adventures in WonderlandBeleg Tâl (talk) 20:56, 30 December 2019 (UTC)
How would you feel about the proposal to create a new Work: namespace to solve the issue? --EncycloPetey (talk) 20:58, 30 December 2019 (UTC)
I must admit that I don't really see how a Work: namespace would solve the issue. The same problem you see currently on Versions pages will persist on Work: pages. The idea of using Versions pages for versions-of-story already exists in Portal space (e.g. Portal:Cinderella). We would just be moving the problem around, not addressing it. —Beleg Tâl (talk) 21:06, 30 December 2019 (UTC)
Furthermore: even if we tried to formalize a separate structure for version-of-work and version-of-story, this completely falls apart for most traditional folk stories and songs where there is no real difference between the two and every single edition is wildly different. How many version-of-work pages would you use for Tam Lin? How many version-of-work pages would you use for The Elfin Knight? —Beleg Tâl (talk) 21:11, 30 December 2019 (UTC)
Not to mention that folk stories often have closely parallel versions in different languages, so then you could have some that are "original tellings" in English alongside some that are translated from another language. They'd end up on different pages if there are separate Version and Translation pages. I wish there were enough Wikisource editors to do a Folk Stories Project and sort this stuff out into Portal-style pages instead. Version pages could be reserved for works that are author-associated, a work of that author particularly. E.G. HC Andersen’s "Tinderbox" is a retelling of an old tale but it is universally known as Andersen's "Tinderbox," thus it deserves its own versions page (or rather translations page in English). Levana Taylor (talk) 01:45, 31 December 2019 (UTC)
Right now, there is a broad scope in the problem, as you have noted, on the one hand between items that are the same work, and on the other items that clearly are not the same work. These disparate items may or may not appear on a Versions page at the whim of an editor. A Work: item would always be for a specific work, making a clear distinction between the two types of listings. The Work: namespace would also include translations. Right now, Esther (Bible) is a Translations page, only because the original was not written in English, even though they are simply different Versions of the same work. If we consider that the "original" Romeo and Juliet story was not in English, then "Romeo and Juliet" would need to become a Translations page, and this would be true of many of Shakespeare's plays, as his plays were not the first tellings of the stories. A Work: namespace would absorb all the Translations pages and lists of editions of the same work, and thus draw a clear divide between listings of the same work (Work:) and related works that are derived from each other, which would then be the focus of Versions pages. Right now. we draw a divide between works originally in English and works not originally in English. The proposal would shift that division to editions of the same work versus versions of similar works. The concern about wildly different editions is already a problem. There are two entirely different early editions of Shakespeare's King Lear, and some modern prints of Shakespeare's works include both for completeness. Despite the differences, they are clearly supposed to be the same work by the same author, whereas a retelling by Charles Lamb for Tales from Shakspeare is clearly not an edition of the same work, but is a related story. Works like The Elfin Knight are no different than copies of ancient works, where different manuscripts preserve or lose different passages. --EncycloPetey (talk) 21:21, 30 December 2019 (UTC)
Okay, now I'm just confused about what is being proposed. The suggestion that translations should be merged into Work space, even though translations are the ur-example of derived works by different authors, further reinforces to me that there is no such thing as a clear distinction between "the same work" and "not the same work". —Beleg Tâl (talk) 22:02, 30 December 2019 (UTC)
Translations preserve the content, even though the language has changed. A translation can be placed side-by-side with the original, aligning the texts. We have a translation namespace that does just that for multiple texts, from books of the Bible to poetry by Catullus. In contrast, a retelling by Charles Lamb will bear little resemblance to the parent text, even though it might be written in the same language. In a library catalog, translations of a work are still considered to be copies of the same work; only the language has changed. Whether you're reading Dante's Inferno in medieval Italian, modern Italian, English, or Chinese, it's still Dante's Inferno, and would be catalogued with Dante as the author. Retellings and adaptations will be catalogued under different authors.
To give a modern example: If I found a German "translation" of Stephenie Meyer's Twilight, I would expect it to have the same story, same characters, and same plot as the original; the same number of chapters, the same everything except the language. The novel Fifty Shades of Grey is a retelling of Twilight (originally as fan fiction) by a different author, of the same basic story. In the retelling, however, the setting is different, the character names are different, and there are no supernatural elements. It is a complete retelling. So, the translation bears more in common with its source text than a retelling. Under our current Versions page structure, both novels would be listed on the same Versions page because they derive one from the other. But an English translation from a German copy would be placed on a separate page, simply because it is a translation.
Translations in library catalogs and on Wikidata are treated simply as editions of the source text, with only a data code indicating that the language has changed. Retellings and derivative works are treated completely separately. The proposal would thus align Wikisource practices with both library databases and Wikidata structure. --EncycloPetey (talk) 22:56, 30 December 2019 (UTC)
Your view of a translation is idealistic. In reality, translations can take (often silently) take huge liberties with their underlying work, frequently amounting to paraphrases or condensations.--Prosfilaes (talk) 01:05, 31 December 2019 (UTC)
I am aware of translational difficulties, but idealistic or not, it's the view taken at Wikidata and by library catalogs. The same disparity can be found in some editions of works, where spellings, vocabulary, punctuation, and more can be altered by editors. Compare the two "first editions" of Moby-Dick, which made different sets of corrections requested by the authos; or the US and UK editions of A Clockwork Orange, for which the US publisher decided to omit the final chapter. Nevertheless, both the UK and US editions of Moby-Dick are considered to be the same work, as are the US and UK editions of A Clockwork Orange. But my point is that other works based on those novels, written by other authors, ought not to be considered the same work as the original. Currently, we make no such distinction. --EncycloPetey (talk) 01:23, 31 December 2019 (UTC)

  Comment Not certain that I wish to wade through that conversation. Keeping it simple. Fixing pages here is all that seems needed.

  • Our versions are for works by the author
    main namespace pages, can link to WD item for enWP article
    If we versions pages that are out of scope, then fix them.
  • Disambiguation pages are for works of same/similar name by various authors.
    main namespace pages, that link to WD items for disambiguation
    Charles Lamb's retelling is not a version, it is a derivative work and to be disambiguated, and it can have notes that put it in context to the original work. Contributors who have morphed these pages should be pointed to this conversation. Disambiguation pages can be structured to capture some of the aspects of derivative works.
  • Translations versions are for works of the original author, where there are different translations/editions of translations
    main namespace pages, can link to WD item for enWP article
  • Portal: ns pages exist for curation where required.
    portal namespace, would link to WD item for portal (see topic's main Wikimedia portal (P1151) and Wikimedia portal's main topic (P1204))
    Not encouraging these for the work level, though if someone wishes to put the work into creating something that explains a subject matter, and a range of disambiguations, versions, translations, and retellings, then go for it. It would not replace these main namespace pages.

billinghurst sDrewth 01:10, 31 December 2019 (UTC)

But your first item is part of the problem. We currently have advice on versions that differs from what you've described, so some kind of change is necessary. The question is what sort of change? --EncycloPetey (talk) 01:15, 31 December 2019 (UTC)
Please be specific of which bit. Inexact commentary is less than helpful. — billinghurst sDrewth 01:24, 31 December 2019 (UTC)
Read the opening three paragraphs of this discussion. Currently, our guideline advice would put Charles Lamb's retelling of Shakespeare's Romeo and Juliet on the same versions page with the play itself. Your comment says otherwise. Hence, we either need to align practice with the advice, or alter our advice to fit practice, or make some other change to resolve the discrepancy. --EncycloPetey (talk) 01:28, 31 December 2019 (UTC)
Gotcha, if you are quoting a page, can I suggest {{cquote}} as so it is overt. I missed the cue, thought it was your comment. [Note to self don't try and write abusefilters, analyse abuse and try to have conversations. My apologies.] — billinghurst sDrewth 01:33, 31 December 2019 (UTC)


Example of Wikidata item

General comment about works at WD and enWP, the item Romeo and Juliet (Q83186) is for the conceptual work and all that follows, not solely the play. It has from what it was based itself, and derivative works. So I don't see an issue with how it is being linked. It is how we wish to look at the latitude of subsidiary links. — billinghurst sDrewth 01:41, 31 December 2019 (UTC)

The WD item points to derivative works, but doesn't list them as editions of the work itself. The WD structure identifies editions of the work by pointing to editions pages for the editions, and to derivative works by pointing to a data item for the derivative work, which in turn points to editions of that derivative work. Each derivative work has its own separate WD data item, with lists of editions, and each derivative work has its own data item with lists of editions. Currently, we seem to make no such distinction. So while Wikidata has separate data items for Macbeth, the play by Shakespeare, and Macbeth, the opera by Verdi, our advice currently would lump them into a single page. --EncycloPetey (talk) 01:53, 31 December 2019 (UTC)
I believe that is a limited perspective, our WD linkage points to a central coordinating point of the concept of the work. The WP article w:Romeo and Juliet there is more than the words of the work itself. WikiCommons' c:Romeo and Juliet is definitely not focused on the base publication, they are the concept. Our versions page focuses on the conceptual, editions, and derived works, and we are arguing about derived works, and the interwikis definitely cater for that aspect, so I don't see a huge difference. The focus of our page is not on the derived works, each of our presentations has/should ahve its own item, and that has been our practice — billinghurst sDrewth 02:32, 31 December 2019 (UTC)
The Wikipedia article is about the play. If the Wikipedia article were listed here it would be placed on a disambiguation page. So comparing what the Wikipedia article does to what we do is fallacious reasoning. Commons content will vary depending on the number of items in a category. When a category grows large enough, subcategories are split off. So, for example, commons:Category:Medea (Euripides) is for the play by Euripides, but there are subcategories for the translation of the play by Augusta Webster and for the 2009 Syracuse performance of the play. And I don't think you've actually looked at what commons:Category:Romeo and Juliet has or what subcategories it contains. The key here is that neither Wikipedia nor Commons make distinctions between items that are the work and items about the work. So if we follow your line of reasoning (based solely on mimicking Wikipedia and Commons) then our "Versions" pages ought to contain items that are the work, as well as items about the work, which is what we currently do with Portals. Is that what you are advocating for?
Further, the interwikis between Wikipedias link only to other articles about the Shakespeare play. The interwikis to Wikiquote link only to pages quoting the play. The interwiki links to other Wikisource projects link only to translations of the play. So I don't see any truth to your claim that the interwikis cater to derived works. Yes it's possible to navigate by following additional links, but we do that with {{similar}}.
That said, are you advocating for a change to the advice on the quoted page, or are you advocating for something else? Your preferred course of action isn't clear. --EncycloPetey (talk) 17:29, 31 December 2019 (UTC)
I beg to differ about the WP article, it is an article about the play, and a zillion things that have sprung from it. If it mentions "legacy" and talks about ballet, it has gone beyond the play, and is about the conceptual work in a broader aspect including deemed pertinent derivatives. Commons has artwork that is not from the original play, it is someone's conceptual interpretation from the play, these are derived works on the subject. So as such, I don't see it as black and white as you do for the other sites, and the WD item. That said, I am awaiting others' comments, they are important for framing. — billinghurst sDrewth 12:15, 1 January 2020 (UTC)

Cakebot1 has been working on Index:Works of Charles Dickens, ed. Lang - Volume 1.djvu and transcluding as subpages of the "The Works of ..." To me this is not a good place for these works to be, and we touched on this conversation above at #Main namespace works; portal works and tendency to encyclopaedic components or listings. The titles that we use should be representing the works. As we already have The Pickwick Papers we need to determine a number of things.

  1. Whether we prefer to have 32 volumes of works reproduced as subpages (with all the inherent relative link resolutions), or a work to be a rootpage
  2. If the above determines that it is a rootpage, then how we would name works from volumes, when they will also have been previously published
  3. Moving the existing work, as it becomes the disambiguation.

My preference is to move and present the work as individual works, and to create a portal page to support the series as re-published. I would like to get this addressed early, prior to the contributor getting well into the work, and before we get further volumes popping up. [Also noting that we will need to fix the chapter numbering to our style]. — billinghurst sDrewth 00:23, 22 December 2019 (UTC)

This was published as a single collection under a uniform title. Moving these to the main namespace as separate items would break the implicit connection of the series. We'd have to disambiguate these copies from other copies if they move to the main namespace, which would create even more work. We already host "works of..." sets for several other authors this way, so I see no problem with hosting it as is, under the common title with subpages. --EncycloPetey (talk) 00:37, 22 December 2019 (UTC)
We have to disambiguate either way. Please tell me how "Works of ..." is useful, and how it represents the work of the author? It more seems to reflect the later work of a publisher, and just a series build, nothing more. That we made that choice previously could just be an indication of a poor choice, not evidence of how we should do things. If we are to keep under the Works of, as "Volume 1" is not helpful. — billinghurst sDrewth 03:10, 22 December 2019 (UTC)
Special:PrefixIndex/Works of displays our "Works of", about 50 pages, 25 seem to be redirects. — billinghurst sDrewth 03:14, 22 December 2019 (UTC)
I think you're confused here. This is not a work by Dickens; it is a work by Andrew Lang that includes works by Dickens. Lang has provided selection, ordering, editing, formatting and layout, introductions and prefatory matter, and indices. Extracting the parts of this work that are derived from Dickens' and placing it out of the context in which it was published would be misleading, and would be a disservice to, e.g., those who wish to compare Lang's selection or editing with that of earlier or later editions. The included Dickens works are obviously also editions of independent works, and so should appear on versions pages for those works, but that in no way affects the status of Lang's work. --Xover (talk) 08:55, 22 December 2019 (UTC)
I agree with Xover here. Things should generally be organized as they were published, and for most authors with Works collections, there are different variations to worry about, making it more important.--Prosfilaes (talk) 15:43, 22 December 2019 (UTC)
billinghurst, not all of our "works of" publication sets begin with those exact words, or even have the word "works" in their title, so your count heavily underestimates the number of such items we currently have. We have titles like The complete poetical works and letters of John Keats; Poetical Works of John Oldham; Victor Hugo's Works (Guernsey Edition); The Writings of Henry David Thoreau; The Plays of Euripides; or Masterpieces of Greek Literature. Not to mention the many magazines, newspapers, journals, and periodical that are set up just like this. --EncycloPetey (talk) 21:13, 22 December 2019 (UTC)
Strongly agree with Xover and the others: compilation and anthology works are still works per se and should be preserved as such. Versions pages and redirects are the correct way to connect top level mainspace titles with editions that appear in such collections. Also, I've spent a lot of time consolidating loose works into their appropriate collections, so I have a personal vested interest here also. —Beleg Tâl (talk) 15:42, 23 December 2019 (UTC)

This seems to me a case similar to The Novels and Stories of Henry James: many works published separately over a period of time, but also claimed to be volumes of a set. I did The Novels and Tales of Henry James (n.b. Tales not Stories) in the form The Novels and Tales of Henry James/Volume 2/The American/Chapter 1, and it is still that way now, but I have come to the view that it isn't right and doesn't work. For The Novels and Stories of Henry James, I named each work as a distinct publication (which it is), e.g. Confidence (London: Macmillan & Co., 1921), and simply link them from the works page The Novels and Stories of Henry James. I commend this method to you as the cleaner, and more in accordance with "things should generally be organized as they were published". Hesperian 01:35, 23 December 2019 (UTC)

It's easy to do that for novel-sized works. What about New Hampshire (Frost)? Should each and every poem be in main space? Does that mean that if we do the Atlantic Monthly, a periodical where at least one of those poems were first published, that everything in it should be in main space? I can see the argument that a series of novels should be broken out by title, but I think it gets real messy in the case of anthologies and periodicals, especially when we're talking about excerpts of longer works and intros that cover multiple works.--Prosfilaes (talk) 11:36, 23 December 2019 (UTC)
No, of course not. New Hampshire is a single published work. These sets are not single published works. They are multiple individually published works that the publisher has declared to be volumes in a set. That declaration doesn't change the fact of them having been published invidually. If each poem in New Hampshire had been published and printed and bound and sold separately, and subsequently declared by the publisher as aggregating into a single work, then, and only then, would I would say that each and every poem should be in mainspace. Hesperian 06:10, 3 January 2020 (UTC)
I don't know that these sets are not single published works; I know that they're separately bound works. Not all the works in The Novels and Tales of Henry James were published separately; presumably many of the short stories were never so published, and the fact that the various collections of stories don't overlap is means that they are functionally volumes in a set. Looking at volumes 10-18 of The Novels and Tales, we're going to want those volumes as separate volumes. In practice, I don't think your proposal is much different from saying novel-sized works, besides making arbitrary historical distinction based on which works happened to have standalone publication and which were originally published in periodicals.--Prosfilaes (talk) 07:50, 3 January 2020 (UTC)
I am not saying that individual works, e.g. single poems, should be presented here separately: no way. I am saying that the books in this set were separately prepared, printed, bound, issued on distinct dates and made available for sale individually (and ultimately will fall into the public domain separately over several years). They are distinct publications, in the literal sense of 'publication': made available to the public via the shelves of the local bookshop. Such distinct publications should be presented here as individual texts, not as subpages of a larger set, where that set is not a publication in that literal sense. Hesperian 23:44, 5 January 2020 (UTC)
I guess I'm saying that The Novels and Tales of Henry James exists only as a concept created by the publisher, not as a literal publication. In which case it is also relevant to note that the publisher Bernhardt Tauchnitz had a 'set' entitled Collections of British Authors that was added to for over 100 years, and ended up with 5370 volumes. Hesperian 23:56, 5 January 2020 (UTC)
But the only name for volume 16 of The Novels and Tales of Henry James is The Novels and Tales of Henry James, volume 16. You can say the same thing about periodicals, which are much more likely to end up with 5,370 volumes.--Prosfilaes (talk) 05:22, 6 January 2020 (UTC)
No indeed. the half-title page reads "The Novels and Tales of Henry James / Volume 16", but the full title page reads "The Author of Beltraffio, The Middle Years, Greville Fane, and Other Tales", and it is commonly indexed under that title.[15] I can't find a good scan of Volume 16, but here's Volume 18. Hesperian 23:57, 6 January 2020 (UTC)
True; which may not be the case for other works. I will note that "Famous Story and other works" can drive bibliographers and collectors nuts, as it can frequently refer to many distinct collections by many publishers.--Prosfilaes (talk) 03:02, 7 January 2020 (UTC)
But it's not so easy to break the novels out in this case, either. It would still be messy. Volume one of the Dickens collection is not the novel; it's volume one of the set as well as volume one of the novel, and the novel exists across more than one volume. --EncycloPetey (talk) 17:25, 23 December 2019 (UTC)

OCR: Enable the Google-based version, until Phe's Tesseract version is operational?

I have recently read the discussion about broken OCR in some detail. The most recent comments point out that it would be up to English Wikisource to enable a (temporary) replacement until the traditional OCR tool is (hopefully) back in working order, or replaced with a better version.

Since we have a reasonably functional option based on Google's OCR, is there any good reason not to enable that by default, pending a more ideal outcome? Pinging some users involved in the discussion: @Xover, @Koavf, @Tpt, @Ineuw, @Jdforrester (WMF), @AKlapper (WMF), @Jan.Kamenicek: -Pete (talk) 21:19, 19 December 2019 (UTC)

I have the Google OCR button in my gadgets, but my experience is that its output is so bad that I do not use it, so enabling it by default does not solve anything for me. However, I understand that for some people (especially the new ones) it may be better than nothing. The main thing I am afraid of is that once we get a "reasonably functional" tool, we will never get a well functional one.

--Jan Kameníček (talk) 23:40, 19 December 2019 (UTC)

Yes, my experience is similar. But it depends on the text -- for some texts, it does a pretty nice job. IMO it's more important to have something for new users than nothing, when it comes to OCR. For the reasons you describe, I'm sure that Wikisource users would continue to advocate for something more functional regardless of whether or not the Google one is enabled, so I do not share your concern in this instance. -Pete (talk) 23:52, 19 December 2019 (UTC)
Like Jan, I have had little success with Google's OCR tool. I usually find that it's easier to type it by hand when the OCR tool isn't working. But that is in part because I work heavily with: (a) Plays or poetry, where the formatting, capitalization, and punctuation do not follow standard sentence patterns. (b) Works with footnotes, which are in a different size and format, and therefore cause the OCR to bork. (c) Works that contain bits of text in other languages, which never come out right. (d) Works that contain special diacritical marks. (e) Works that contain unusual archaic typography, such as special characters for "ct", or italicized script that the OCR can't handle. If you're working on a text that consists primarily of standard sentences and paragraphs, without italics or any special characters, and without archaic spellings or archaic typography, that is entirely in English, then Google's OCR might be useful. But for me, it isn't. --EncycloPetey (talk) 00:47, 20 December 2019 (UTC)
Something is better than nothing and there is no traction at Phab. I think the OCR tool that I am using now works fine. —Justin (koavf)TCM 00:50, 20 December 2019 (UTC)
  Support If this is a proposal, make Google OCR the default since that is the only working OCR. All this means that it will be listed on the Gadgets page under Editing tools for Page: namespace instead of Development. — Ineuw (talk) 05:39, 20 December 2019 (UTC)
  Support Why not? --Xover (talk) 06:26, 20 December 2019 (UTC)
PS. Aklapper and Jdforrester are just processing and trying to manage all Phabricator tasks (there're a couple of thousand open tasks, iirc, all told). Neither one of them will have any particular opinion on this issue, or the specific Phab regarding Phe's OCR, so there's no need to ping them here. --Xover (talk) 06:43, 20 December 2019 (UTC)
Note that the privacy policy requires informed consent for users before sending their data to non-Wikimedia services, which includes Cloud Services (like the proxy for this tool). The gadget as-is is in violation of the privacy policy and should be fixed to add a modal consent form (immediately, and definitely before this is enabled for users by default). Jdforrester (WMF) (talk) 08:28, 20 December 2019 (UTC)
@Samwilson: ^^^ FYI. I'm trying to read up / do some digging on this to try to figure out what wriggle room there is and / or the broader impact on other gadgets. --Xover (talk) 16:58, 20 December 2019 (UTC)
@Jdforrester (WMF): Why is a proxy hosted and run by the WMF considered a "non-Wikimedia service"? Which part of the privacy policy deals with this? Kaldari (talk) 19:43, 20 December 2019 (UTC)
I guess you could argue that the API is non-Wikimedia. I'd still like to know what the actual wording of the policy is that relates to this, though. Kaldari (talk) 19:55, 20 December 2019 (UTC)
i’ll believe in the "privacy policy" scruples, when i see them implemented in the m:IP Editing: Privacy Enhancement and Abuse Mitigation. until then, editors should expect to be constantly surveilled across all projects. Slowking4Rama's revenge 16:55, 21 December 2019 (UTC)
  Support provided that we first implement the privacy form mentioned by Jdforrester above. -Pete (talk) 17:50, 20 December 2019 (UTC)
@Peteforsyth: Note that that privacy policy issue is purely a formal requirement thing in this particular instance. There is no information that would normally be considered privacy sensitive being transmitted anywhere for this case.
When you hit the OCR button (and only when you actively press the button), the gadget sends the language code of the project (i.e. "en" here on enWS) and the URL of the scanned page image to the Toolserver. The Toolserver doesn't see your IP address because the request passes through a proxy server (managed by the WMF like the wikis). The OCR tool on Toolserver fetches the scanned page image and passes it and the language code to Google's Vision API (all Google sees is the scan image, the language code, and the IP of the Toolserver; your browser never communicates with Google directly). The Google API then returns the extracted text, which the tool on Toolserver returns to your web browser, and which the gadget code then inserts into the text field for editing.
And just to rub salt in the wound, the Google OCR tool/gadget was, AIUI, developed by the WMF Community Tech team; meaning that not only is no actually sensitive data being transmitted, but every component involved that might conceivably be an attack vector is actually under the WMF's direct control.
That said, the privacy policy is not optional and not subject to per-project policies, so we'll have to figure out some way to make this work within those requirements. I'm just not sure how the heck to do that just yet (there is no standard facility for displaying such a prompt, and ditto for saving that choice for next time; showing a confirmation dialog for every single page is… not even an option). --Xover (talk) 18:42, 20 December 2019 (UTC)
Makes sense, and thanks for the explanation. My "condition" should not be interpreted too strictly; I of course defer to those more knowledgeable than myself about the proper way to handle this. -Pete (talk) 20:14, 20 December 2019 (UTC)
Google OCR is excellent at reproducing accented Latin characters for my projects about Mexico. I also used the OCR on French Wikisource and it also works very well. It seemed to me that it is also Phe's OCR tool. I was hoping to figure out how I can link to it in my vector.js. I asked this on in the French Scriptorium but received no reply. Perhaps someone here can figure it out and let us know? — Ineuw (talk) 10:19, 12 January 2020 (UTC)

Happy Public Domain Day!

Here are some things entering the public domain in the next several hours: https://web.law.duke.edu/cspd/publicdomainday/2020/Justin (koavf)TCM 06:29, 31 December 2019 (UTC)

Do our "1923" templates need to be updated? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:53, 31 December 2019 (UTC)
They shouldn't. {{PD-1923}} was adjusted last year to automatically progress, and {{PD/1923}} uses the code of the first. --EncycloPetey (talk) 17:17, 31 December 2019 (UTC)
Template:PD-anon-1923 does need converting to make it automatic. It's currently stuck on 1924. BethNaught (talk) 11:55, 1 January 2020 (UTC)
We should rather be updating and using {{PD-anon-1996}}, and just leave 1923 alone. — billinghurst sDrewth 12:05, 1 January 2020 (UTC)
@BethNaught: I have updated {{PD-anon-1996}} and {{pd/1996}} so they display dates and text appropriately for the 2020. They are relatively done, so progression each year will be fine too. — billinghurst sDrewth 13:46, 1 January 2020 (UTC)
1923 is not relevant for anonymous works anymore, and 1996 conflates "expired in the US" and "not renewed by the URAA", which isn't something we should be doing.--Prosfilaes (talk) 19:56, 1 January 2020 (UTC)

┌─────────────────────────────────┘

Not only do we have at least three templates with "1923" in their names (and quite why we have {{pd/1923}} and {{pd-1923}}, differentiated only by one punctuation character, for apparently very different functions, is anybody's guess), but they hard-code Category:PD-1923 and Category:Author-PD-1923. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:23, 1 January 2020 (UTC)

We have PD-1923 for works where we know their publication date, but not the date of the author, or the date of the author is complex (multiple or corporate), plus it only gives US. PD/1923 allows the split copyright of US and home country based on the author's date of death. PD/ gives us the indication of when we can move to Commons, whereas PD- does not. — billinghurst sDrewth 12:39, 1 January 2020 (UTC)
"PD/ gives us the indication of when we can move to Commons, whereas PD- does not" I know what they do; but does anyone seriously think that "/" vs. "-" is the best way to convey that to colleagues, especially new editors? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:33, 1 January 2020 (UTC)
Slashes means they are subpages, and accordingly can be relatively linked. — billinghurst sDrewth 13:44, 1 January 2020 (UTC)
Template:Pd/1923 is not a "subpage", because Template:Pd does not exist; it's just badly named. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:33, 1 January 2020 (UTC)

┌─────────────────────────────────┘

The year 1923 is no more relevant (in fact no fixed year is relevant now, the elapsed time is, so maybe also the wording of the text can be updated). So I suggest

  1. to rename the template {{PD-1923}} for {{PD-US-95}}.
  2. to rename the template {{PD/1923}} in a similar way, e. g. for {{PD/US-95}}, or even better to merge it with the previous one
  3. to rename the template {{PD-anon-1923}} for {{PD-anon-US-95}}, or merge it with the previous ones, making the anon just their parameter, e. g. {{PD-US-95|anon}}
  4. to change the texts for "This work is in the public domain in the United States because it was published more than 95 years ago. …" or something similar. --Jan Kameníček (talk) 12:28, 1 January 2020 (UTC)
While I agree that 1923 has progressed, all the years of publication are relevant thereafter, and it is best to just keep it all harmonised. It was chosen that way to keep it simple; simply pick the year period, add your publication year. "-95" is just going to cause issues, is it -95, -95+1, -95 from today, how many is -95. Fixing the templates at the back end, is pretty easy, and I will just plan to get it done. — billinghurst sDrewth 12:34, 1 January 2020 (UTC)
I do not see any difficulties. It would be as easy to use as e.g. {{PD-anon-70}} or {{PD-old-70}} at Commons, and most people coming here usually have experience with Commons templates. Or alternatively, it can be renamed for {{PD-US-96}} with the text "This work is in the public domain in the United States because it was published at least 96 years ago or earlier. …" The documentation can not only explain it further, but even specify which the latest acceptable year is (with automatic update of the year). If the templates were merged into one, everything would be perfectly harmonized. --Jan Kameníček (talk) 14:36, 1 January 2020 (UTC)
The problem with using a template like "PD-anon-70" is that it is applicable for 10 years or less. Eventually, we will reach 80 years, 90 years, and 95 years, at which point such templates must be replaced based on the increasing number of years since the author's death or the work's publication. Any template that is set to operate based on a fixed range after the author's death or the work's publication date will produce this issue of perpetual monitoring. Such an approach might be possible on Commons, with a larger community to constantly adjust, but for a smaller community like Wikisource, it is not the best approach. It is better to have templates that adjust their display based on information provided about the date of publication or date of the author's death. --EncycloPetey (talk) 18:43, 1 January 2020 (UTC)
100% agree that we should remove "1923" because it's not semantically meaningful--we're just saying, "public domain in the United States due to expiration of copyrite", whenever that is. —Justin (koavf)TCM 12:49, 1 January 2020 (UTC)
People may indeed be familiar with the license names at Commons, but the problem on Commons is that those templates do not perform the same functions there as they do here. Over there, an item needs two templates: one for pma licensing and another for US licensing. Sometimes there is a combined template, but sometimes not. Also, if we use the same naming as Commons, people may assume our licensing works just like Commons, and it doesn't. I'm not saying that we shouldn't change licensing template, but I am saying we shouldn't be looking at the confusion on Commons to decided what we should do here. You only have to look at the parameter listings on commons:Template:PD-US-expired to see how confusing a single template can become. Our current system is much easier to use. Also, a reminder that this same discussion happened in 2018. --EncycloPetey (talk) 16:41, 1 January 2020 (UTC)

┌─────────────────────────────────┘

Well, although I really do not see anything difficult in the templates I proposed, I can live with other kind of templates too. However, I am convinced that whatever templates are created or updated they should:

  • have some comprehensible and general name which does not have to be changed every year (current PD-1923 is an example of a template’s name not suitable for our purposes). PD-US-96 is imo suitable, but I am open to other suggestions too.
  • be updated automatically and without any necessary human interference a second after the old year finishes and new one starts, so that contributors have correct templates at hand immediately. --Jan Kameníček (talk) 18:58, 1 January 2020 (UTC)
I'd prefer -95 or US-expired; 95 is confusing, but so is -96 and any other choice, and it's consistent with -70 and -50. They definitely should be changed. I might argue for moving PD-old to PD-old-100 and replacing PD-old with a warning message, since it is the biggest confusion with Commons users.--Prosfilaes (talk) 19:56, 1 January 2020 (UTC)
There is a quasi-project ongoing at Wikisource:Requested texts/1924 for Public Domain Day; I am adding The Box-Car Children (darker story in the 1924 version than in later versions, interestingly). So get over there now and add a few. Lemuritus (talk) 21:13, 1 January 2020 (UTC)
I agree with user:billinghurst that we shoud leave 1923 templates as they are. They are true indicators when a work came to be in the public domain. 1924 public domain works should be indicated as such. Both of these have informational value even if it sounds incongruent. — Ineuw (talk) 19:26, 2 January 2020 (UTC)
@Ineuw: But the current 1923 templates do not indicate when the work came to be in the public domain at all. They are used for works published until 1922 which came into public domain in 1998 (and I think many of them even much earlier), as well as for works published in 1923 which came into public domain in 2019. --Jan Kameníček (talk) 16:51, 20 January 2020 (UTC)
  •   Comment having some more time to think about this, I am wondering whether we just look to having {{PD-US|year of death}} and we gracefully deprecate both {{pd/1923}} and {{PD-1923}} and revert its text back to where it was. We can copy the current updated code over to this new template, and if there are any 1923+ works, then we simply update them over to the new template. I think the use of "expired" is just superfluous, and we can write that into the text.

    My reasoning for PD-US are it is simple and it somewhat aligns with Commons. We merge the logic of these templates, if YYYY for DoD is given it displays the death date PMA text; if not date is given then it just gives the standard "out of copyright" text. The PD-1996 and PD/1996 remain as they are as they still have determinative conversations based on year of publication, and year of death; once a year we would run a bot through and convert those on the 95 year boundary. — billinghurst sDrewth 13:35, 3 January 2020 (UTC)

    Agree. --Jan Kameníček (talk) 22:38, 6 January 2020 (UTC)
    Would this template be used for works published less than 95 years ago? The potential problem I see with having PD-US as a template name is that users may place it regardless of the date of original publication. That is, will this template cover "no-renewal" and "no-notice" situations, or will those templates still serve the separate function? If so, then we may be creating a whole new problem. Nor can we rely on the date of that edition's publication as a guideline, since we need to know the date of first publication and/or date of copyright registration, since that is not always the same, and which is not always included by an uploader. For example, I have come across a work whose initial publication date is 1927, but will not enter PD until 2024 because copyright was filed and renewed within six months and overlapped into the following year. If we're going to overhaul things, we need to consider sources of confusion and what information will be needed by someone to verify the template is correct. --EncycloPetey (talk) 23:05, 6 January 2020 (UTC)
    I agree that we should not have one template for works published more than 95 years ago and no-renewal and no-notice situations. I'm not a fan of a flat PD-US, but a mere naming of a template (the main template) isn't going to stop licenses from being carelessly placed on works.
    I don't understand what you mean by the publication date, though. My understanding is that works could be filed for copyright anytime in the first 28 years, even alongside renewals, but the clock started on the earliest of publication, copyright date, or registration.--Prosfilaes (talk) 23:44, 6 January 2020 (UTC)
    If the work was first published in the UK, there was a grace period in which to register a US copyright (six months?) I have found an instance where the initial publication in the UK happened at the end of 1927, but the initial US copyright was granted in early 1928, followed by a renewal. I have verified this with the copyright database at Stanford. So in this instance, the date of initial publication (in the UK) cannot be used to determine copyright status within the US. --EncycloPetey (talk) 00:17, 7 January 2020 (UTC)
    I checked with Clindberg on a similar case, and he said that the clock would have started in 1927. Since it's not an active issue, I don't want to ping him, but I'm pretty sure initial publication was enough.--Prosfilaes (talk) 03:11, 7 January 2020 (UTC)