Wikisource:Scriptorium

Scriptorium
The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help. Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 407 active users here.

AnnouncementsEdit

Page preloader gadgetEdit

There is a new gadget for pre-loading the next scan page. It's mostly cribbed from mulWS, but it tweaked so it works in non-edit mode. This speeds up the loading of the "next" link when in the Page namespace. When preloaded, a green underbar is shown on the next page link (see right).

It can be found in the "Editing tools for Page: namespace" section of Special:Preferences#mw-prefsection-gadgets.

Inductiveloadtalk/contribs 11:11, 18 November 2020 (UTC)

This gadget may or may not be needed once this patch lands (hopefully next week): phab:T230689. Inductiveloadtalk/contribs 14:47, 28 November 2020 (UTC)

Gadgetisation of the PageNumbers/Dynamic Layouts code: step 1Edit

The first step of unpicking the dynamic layout and page numbers Gordian knot is to shunt all the code from Mediawiki:Common.js to gadget-based code. This is because Common.js code can't be easily turned off to allow a local or sandbox version to be used for development.

I have now moved the code to a gadget, but you need to take some steps to disable the Commons.js code as well. If you would like to test it:

  • Visit Special:ApiSandbox#action=options&format=json&change=userjs-disable-nongadget-pagenumbers=1 and submit the query. You will be prompted to automatically add a CSRF token, just click "submit".
  • This sets the userjs-disable-nongadget-pagenumbers user option to 1, which disables the current code in Mediawiki:Common.js
  • Page numbers and dynamic layouts should now be off.
  • Visit your gadget prefs and enable the "Display Options" gadget under the "Development" section.
  • You should now see the page numbers and dynamic layouts again. There should be no visible difference between the gadget version and the current version.
  • If you want to go back, disable the gadget and use the API sandbox in the same way, but set userjs-disable-nongadget-pagenumbers to 0.

If anything is not working, please report it here.

If no-one reports any issues in a week or so, we can make the gadget default-enabled, disable the code in common.js and move forward with the process of uncrustifying the code. Inductiveloadtalk/contribs 14:38, 18 November 2020 (UTC)

@Inductiveload: Works for me. BTW I left some feedback on your user talk page, concerning a situation that arose whilst testing something.ShakespeareFan00 (talk)
The code is now loaded as a default gadget and the Mediawiki:Common.js code is removed:
This means that it is now possible to disable the pagenumbers/layouts code easy and this makes it easier for maintainers to test other versions before going live.
It may take a short while for the changes to propagate through caches. As before, if breakage occurs, please let me know in this thread. Inductiveloadtalk/contribs 13:41, 28 November 2020 (UTC)

ProposalsEdit

Collective work inclusion criteriaEdit

[This is a proposal stemming from the #Policy on substantially empty works section below.]

Since there has been no more input for a month, here we go. This is only a proposal, so any part of it can be changed, or the whole idea rejected. Inductiveloadtalk/contribs 10:58, 25 August 2020 (UTC)

Inclusion criteria for articlesEdit

Some works are composed of multiple parts that can stand alone as independent pages. These works are generally encyclopedias, biographical dictionaries, anthologies and periodicals such as magazines and newspapers and so on. Such "collective works" have slightly different criteria for inclusion in the main namespace. The aim of these criteria is:

  • To allow individually-useful articles, or sets of articles, to be transcribed to the main namespace without requiring active transcription of hundreds of pages of unrelated articles
  • To nevertheless make it easy for other users to "drop in" and add more articles to the work.

To be eligible for inclusion, a component of a collective work (e.g. a single magazine article), should satisfy the following criteria:

  • The component should be "non-trivial" in scope and importance. For example, only a title page or single-paragraph "notice to subscribers" in a magazine is unlikely to be considered useful on its own. However, it would still be part of a full transcription of the rest of the parent unit (e.g. a magazine issue).
  • The work should be scan-backed.
  • Main namespace pages should be created for the work at the top level and any intervening levels (e.g. Volume and Issue/Number ranks should exist). Sometimes, the Issue/Number rank redirects to a section on the Volume page.
  • Front matter of each intervening level the "parent unit" (e.g. a magazine volume and issue) should be transcribed and transcluded
  • A table of contents is required for the parent unit in question. Use {{AuxTOC}} if the original work doesn't contain a TOC.
  • Appropriate infrastructure around the work should exist. This might include internal plain link templates ("lkpl"), dedicated article link templates for use on author pages, formatting templates for repeated formatting elements, etc. All templates should be fully documented.
  • The article should be linked to from any relevant author pages and suitable portals
  • Oppose. An article is a complete work. The only requirement for inclusion should be that it actually is an article. This proposal would result in, for example, the deletion of huge numbers (at least hundreds) of perfectly good short stories and similar articles created over more than a decade for no good reason. I can see no reason for demanding every piece of front matter, which might consist of large quantities of indexes, adverts and other material of no great importance but massive bulk and technical difficulty. Insisting on scan backing would be extremely damaging if a particular article is or should be used as a source for Wikipedia. The need to provide online copies of sources to maintain and improve Wikipedia is overwhelmingly more important than the luxury of scan backing. Requiring the creation of templates would be a crushing burden, because most people do not know how to create them. It is in any event wholly unecessary. Whether the article is linked to is irrelevant to inclusion. I can understand the desire for a main page that links to the article (and even that would take a lot of effort to effect in some cases where a lot of articles have already been created), but the rest is just obstructive. The problem with this proposal is that it would create a massive crushing burden that is wholly unecessary and produces no useful benefit to the project or readers. It is burdensome restrictions for the sake of restrictions. James500 (talk) 20:18, 29 August 2020 (UTC)
  • Support. Without a system like the one you have described in place, sub-pages of works could be created wantonly without any means of completing the works from which they were derived. If an article, which is a selection from a larger work, is created without any infrastructure, it will be very difficult for other Wikisourcerors to complete the work which has been started, as they will have to find and upload a scan and set up the complicated not-article material without the aid of the person who created the first article. The new system will also make it easier for other contributors to work on smaller parts of a larger work, without worrying about demanding formatting concerns. TE(æ)A,ea. (talk) 12:30, 30 August 2020 (UTC).
    • Content creation should not be described as "wanton". There are means of completing the works from which the sub-pages were derived. If an periodical article is created without so-called infrastructure, it is very easy for other Wikisource editors to complete the work which has been started. It only becomes difficult when someone goes on a deletion spree. And it is massive numbers of nominations that cause problems. James500 (talk) 18:33, 30 August 2020 (UTC)
      • This page is a fine example of what I refer to. A novel contributor, with no previous involvement with this work, or one like it, would have to generate an entire system for reproducing (transcluding) articles from that work. The example I provide is more complete than other pages, and is much more complete, in relation to the whole work, than a single article. It would be very difficult to add to larger works, where the basis is merely articles or other pages in the state of which I complain. TE(æ)A,ea. (talk) 21:21, 30 August 2020 (UTC).
        Oh sheesh is that happening again. Fully agree with you TE(æ)A,ea that it is wanton and of little value. That content does not belong in main namespace. Main namespace is for transcribed work. Constructs and curation belong in portal namespace. I have created the portal and moved the non-mainspace material. — billinghurst sDrewth 23:17, 30 August 2020 (UTC)
        • That page was created more than a year ago. Nothing is "happening again". You did not move the bibliographic information from the mainspace page to the portal. I had to add it to the portal myself. If that important bibliographic information had been deleted by mistake, that is an example of how seriously disruptive the proposed deletion criteria could be. The word "wanton" is needlessly offensive. The primary meaning of the word "wanton" is "sexually promiscuous" and it is applied to other things by analogy. Please do not use that word. James500 (talk) 00:49, 31 August 2020 (UTC)
          • what's "happening again", is the periodic pearl clutching of the deletionists, who are opposed to an open project, and seek to provide a tl;dr of the "one right way" to do transcription. if a text is useful, and people can work to organize it, then we should include it. put a maintenance category, and move on. making up exclusion rules is a waste of time with the prospect of a growing backlog, or filters turning away newbies. take a look at german wikisource, if you want to know how that turns out. [1] Slowking4Rama's revenge 21:38, 1 October 2020 (UTC)

  Comment @Inductiveload:

The proposal, as is, would require inhibit the ad hoc transcription of articles from "The Times", eg. The Times/1914 and things linked from {{The Times link}}. Is that in or out of scope for your proposal? Maybe there should be a declaration of some governing principles first. What is looking to be achieved, and indications of what is trying to be stopped. Then we can get onto a structure. I know that we created {{header periodical}} to capture where we have more sporadic collections of articles from newspapers. [Now I could be convinced that such constructions are better to be in the portal namespace rather than main ns.]

Some examples of pages considered problematic would be useful for context. If the proposal is an effort to have articles from a periodical becoming part of a hierarchy of the periodical, ie. subpages, then YES, I fully support that, in contrast to a random root level pages without context to the publication. If the proposal is to set up a fully qualified structure for every periodical where we just want to reproduce one article, then NO. This is self-interest as I regularly want to reproduce an obituary for an author to establish biographical information and we are never going to get all that requisite newspaper construct data, and we are virtually never going to get the scans.

For any newspaper article I have transcribed I will generally do "Periodical name/YYYY/Article name" to give it grounding, and the article would have some "notability". The Times I did an extra hierarchy level. I will accept that there will be early works that I transcribed that may be incomplete by that standard and I would not transcribe them that way today. — billinghurst sDrewth 15:31, 30 August 2020 (UTC)

To be eligible for inclusion, a component of a collective work (e.g. a single magazine article), should satisfy the following criteria:

  • The component should be "non-trivial" in scope and importance. For example, only a title page or single-paragraph "notice to subscribers" in a magazine is unlikely to be considered useful on its own. However, it would still be part of a full transcription of the rest of the parent unit (e.g. a magazine issue).
  • The work should be scan-backed.
  • Main namespace pages should be created for the work at the top level and any intervening levels a suitable, logical subpage hierarchy developed (e.g. Volume and Issue/Number ranks should exist). Sometimes, the Issue/Number rank redirects to a section on the Volume page.
  • Front matter of each intervening level the "parent unit" (e.g. a magazine volume and issue) should be transcribed and transcluded
  • A means to navigate the subpages of the work is required; a table of contents is preferred, though alternatives exist. A table of contents is required for the parent unit in question. Use {{AuxTOC}} if the original work doesn't contain a TOC.
  • Appropriate infrastructure around the work should exist. This might include internal plain link templates ("lkpl"), dedicated article link templates for use on author pages, formatting templates for repeated formatting elements, etc. All templates should be fully documented. (additional) Parent template exist to make this readily easy.
  • The article should be linked to from any relevant author pages and suitable portals; (additional) orphaned pages are not acceptable.
    • If an article is orphaned, that is certainly a reason to add links to the relevant author page or portal. It is not a reason to delete the article. Issues that can be addressed in a very straightforward way by adding links to other pages are not suitable for use as deletion criteria. Why would you delete the page instead of just adding the links? This kind of thing belongs in a style guide. I suggest the words "eligible for inclusion" are the problem with some of these criteria. James500 (talk) 01:33, 31 August 2020 (UTC)
      We are wanting to get people to link. We don't delete a work for lack of a linking, we are not that petty. What that criteria does is limit the transcription and addition of the trivial, linking indicates that it requires some relevance. — billinghurst sDrewth 14:57, 31 August 2020 (UTC)
    • @Billinghurst: I mostly agree with your formulation - that's more flexible in the case of newspapers. @TE(æ)A,ea.: has already given an example, but there are several more examples in the #Policy on substantially empty works below.
    • I do still think we should be requiring the front matter, but perhaps only when we have scans. Usually, it's just a title page or issue banner, it usually provides the date and number as in the original and it prevents the main-space page being just a floating TOC: e.g. The Chinese Repository/Volume 1 and The Chinese Repository/Volume 1/Number 1, versus, say, The London Quarterly Review/39 (which doesn't have a scan, so it's kind of fair enough in this case, but if it had a scan, it should get the front matter).
    • I was going to disagree with the removal of the scan section, but if it is downgraded to "if possible", since the current global policy is pretty much "scans if at all possible", it doesn't need to be repeated.
    • For clarification: by "Parent template exist to make this readily easy." do you mean things like Template:Authority/lkpl? Inductiveloadtalk/contribs 11:11, 31 August 2020 (UTC)
      I was meaning template:article link primarily as it is more what we have used for journals. template:authority/link is more aligned to dictionaries and the like. But yes, one of those as the parent template, or used directly. If we have a scan, then yes to front matter, so we can qualify in the regard of its existence.
  • I have a question; let's take Golfers Magazine. I expect that there will be exactly one article ever transcribed from this--Ask the Egyptians, by Rex Stout, an obscure short story by a not so obscure author. I'm glad to provide scans; I think we should demand scans for stuff that wasn't originally published digital. And it will get tucked under a Golfers Magazine/Volume 28/Issue 3/Ask the Egyptians. But how much work do you expect here? I would begrudgingly create a ToC for the issue, but messing with templates seems completely unnecessary.--Prosfilaes (talk) 14:03, 31 August 2020 (UTC)
    Personally I think that scans are nice, maybe preferred, not mandatory. Sometimes getting scans is either not possible, or just problematic. I have numerous newspapers to which I can get access through subscription sites, but producing scans to upload is just MEH! especially if I just want an obituary reproduced. (Noting that where I just want a rough transcription or a snippet that these days I put it on an author talk page.) Have a poke at Category:Obituaries for a range sources that myself and others have used.

    For your example, I would have gone for "Golfers Magazine/YYYY/article name" and then slapped down {{header periodical}} at the root level, as we get more years, then we can break it down further. — billinghurst sDrewth 14:57, 31 August 2020 (UTC)

  • @Prosfilaes:, what I think would be nice here might be:
    • The top level page, pretty much as it is. Doesn't look like there's much more to say about this work.
    • I can't really see any sensible templates (note "might include" in the proposal) to create for this work. It's not a dictionary so it doesn't obviously need a lkpl, and it's not big enough to merit an article link template of its own. Perhaps if all the headers are identical, there could be a formatting helper, but not critically needed.
    • Personally, I'd like to see the cover if there is one and it's "nice" like this one (obviously not a library binding), and the issue header on the issue sub-page, but I can see the argument that it's a bit pointless if there is no intention to transcribe the rest of the issue. The TOC (which already exists in the original work) is something I'd prefer to see if possible, but I do get that it's a bit of an imposition in this case, where only one article is "interesting".
    • A list of the known scans somewhere (90% of periodicals seem to do this in the mainspace, but that's evidently controversial). It looks like Hathi has an incomplete list and the IA has another Google-fied copy of v.12, so in this case probably just what Hathi has. A lot of the time a mish-mash is needed to get a set of links. Uploading is strictly optional - obviously preferred, but we all know how much of a pain it is, and page-listing and checking periodicals is pretty masochistic, so it's absolutely not needed.
    • Again personally, I prefer "Golfers Magazine/Volume 28/Issue 3/Ask the Egyptians" than "Golfers Magazine/1916/Ask the Egyptians" since we might as well put things in the correct place ahead of time and it provides the obvious place for things like front matter. But I know that's not how it's always done, especially for newspapers where the content is often even more sparse, proportionally speaking, than magazines. Inductiveloadtalk/contribs 15:54, 31 August 2020 (UTC)
      @Inductiveload: If we can get that data, then that is definitely preferred, and I would think that for journals we would encourage it. For newspapers, I doubt that we are going to get the coverage, and they are just a lot harder due to how those beasts are constructed. Probably a case of differing guidance, and difference tolerances. — billinghurst sDrewth 14:23, 20 September 2020 (UTC)
  • Caveat: I was hoping I would find the time to really dig into this and contribute something with some thought behind it, but I keep being disappointed, so instead I'm just going to do the drive-by thing. Sorry!
    I   Support Inductiveload's proposal as written. I disagree with Billinghurst's proposed softening, in particular regarding scans. We need to start getting a hard scan requirement (with the obvious exceptions) into policy, and partial works like this is where the requirement is most urgent as it is a de facto requirement for other contributors to be able to work effectively on completing the work. I am open to, and lean towards, removing the templates requirement. Templates are very hard for most people, and a somewhat tall order even for long-term Wikimedians, and I don't consider bespoke templates to be a critical factor.
    I also support soft application of this policy, the same way we allow for {{incomplete}} and {{missing image}}. Billinghursts concern regarding gigantic efforts required for front and end matter (long tables of contents, indices, etc.) is a legitimate one, but I think this is better handled by softing application than softening the policy. If the text is put in a sub-page structure, is scan-backed, and the front matter is coarsely there, I can live with something like a hypothetical {{toc part missing}} or {{issue toc missing}}. With all the coarse structure in place, filling in detail is eminently doable by crowdsourcing.
    I also stress that I don't consider the establishment of this policy a bright-line immediate cause for deleting existing texts. I oppose an explicit grandfather clause in this policy, but I !vote in favour of it in the context that our practice is not to proactively mass-delete historical texts just because we raise the standard for quality. I do, however, expect that individual texts that do not meet this new policy will be proposed for deletion piecemeal over time, as people happen to run across them, with no progress toward meeting the standard, or are too pathological to fix (which should certainly be the first approach whenever possible). And my expectation is that in those discussions those texts will either be improved to comply with this policy or they will be deleted in accordance with this policy. I also very much expect contributors who disagree with this to express their disagreement politely and constructively: prioritising different factors (e.g. quality over quantity) is in no way shape or form cause for name-calling or ascribing ulterior motives to other contributors. --Xover (talk) 13:22, 22 November 2020 (UTC)

No-content mainspace pagesEdit

This one is probably even more controversial so it's a separate proposal:

Collective works are commonly referenced by other works. Due to this, it is permitted to pre-emptively create the top-level main namespace page to collect incoming links, even when there is no content ready for transclusion. This also allows labour-intensive research into location of scans to be preserved and presented to users even when no transcribed work has been completed. The following is required for such a work:

  • A header with a brief description including active dates, major editors, structure (e.g. series) and so on
  • Redirects from alternative names (e.g. when a work has changed name or is referred to by other names)
  • A listing of volume scans should be added, and it should be as complete as possible, based on availability of scans online. As always, creating Wikisources index pages is preferred, but external scans are acceptable.
  • Creating sub-pages (volumes or issues) should follow the article inclusion criteria. This means a sub-page should not be created if there is no content.
  • Oppose As above these restrictions are an unecessary burden that would produce no real benefit and presumably result in lot of deletions. We do not need lists of editors. We do not need a complete list of volumes. (There may be hundreds of volumes of a particular periodical that have scans. For example, a page with links to scans of twenty volumes should not be deleted because the creator failed to link to scans of another eighty volumes.) Lack of redirects is not a reason to delete these pages either. James500 (talk) 20:37, 29 August 2020 (UTC)
  • Support, mostly. Generally speaking, I think that if a periodical changed its name, then there should be a separate page under the new name; however, redirection pages from alternate titles would be preferable. The other requirements are not overmuch burdensome, and would make useful a page that is otherwise empty, due to a lack of transclusions. TE(æ)A,ea. (talk) 12:30, 30 August 2020 (UTC).
    • None of our periodical pages includes the names of the editors, as far as I am aware. Not one. Under this proposal, every single periodical we have would be deleted. Further, it is not possible to include the names of the editors when they are anonymous. James500 (talk) 18:24, 30 August 2020 (UTC)
      • @James500: "every single periodical we have would be deleted" - or we could make the effort to improve such works as we find them. Generally, an except from Wikipedia or some other source would do just to provide some context. E.g. The Condor vs The Journal of Jurisprudence, which has the dates, but not other useful info, not even the country. For example, even a quick trawl would allow to write something like "The Journal of Jurisprudence was a Scottish law journal published in Edinburgh from 1857 to 1891. The first successful Scottish law journal, it covered all aspects of the Scottish legal system and included editorials, biographies and short articles as well as case law and reporting of legislation. It merged with the Scottish Law Magazine in 1867. It was largely replaced by the Juridical Review in 1891.". The editors aren't particularly obvious here (so they're not "major editors"), but sometimes editors are important to the work's history and are explicitly noted, e.g. All the Year Round or The New-England Courant.
      • Basically, if a page has zero or near-zero transcribed content, in my mind it can edge over the line into acceptable as long as it's providing useful auxiliary bibliographic information, which might also include collation of various names. This is somewhere WS can actually provide value-add - nowhere else online, as far as I know, provides a venue for this information (IA/Google metadata is terrible, OCLC is not very good at periodicals, Hathi is not can't download easily, none are editable, often a complete scan list uses various sources, etc). However, "it was a periodical and here's a handful of raw external links, kthxbai" doesn't quite cut it, even for someone who thinks these pages can be useful like me.
      • I've said it before several times, but the aim here is not, not, not to get all the pages like The Journal of Jurisprudence deleted, but instead figure out what needs to happen to keep them. To me, a decent blurb and a tidy list of volumes and scans will do it, but that's far from consensus. As it stands, as far as I can tell, the only reason half of Portal:Periodicals isn't getting unceremoniously dumped into Portal space (something I personally would like to find an alternative outcome to) is no one really wants to deal with it. We can fix that by coming up with a minimum level which the pages should meet and then fixing them up. Inductiveloadtalk/contribs 12:37, 31 August 2020 (UTC)
    • @TE(æ)A,ea.: about the names, above is an example, where the The Journal of Jurisprudence absorbed the Scottish Law Magazine in 1867. Though technically after the merge TJJ became The Journal of Jurisprudence and the Scottish Law Magazine (e.g. here, but not the title pages), it was still the same work. So in my mind, we could have The Scottish Law Magazine running up to 1867 and then The Journal of Jurisprudence for 1857–1891, with notes about the merge in both headers.
    • Another example of a work that changed name, but remained the same fundamental work is Monthly Law Reporter, which was just The Law Reporter for the first 10 years, and even kept the volume sequencing over the name change (though it added a "new series" number). So The Law Reporter should probably be a redirect. Inductiveloadtalk/contribs 12:37, 31 August 2020 (UTC)
      • The Scottish Law Magazine [and Sheriff Court Reporter] was originally called the Scottish Law Journal and Sheriff Court Record. It has a page already which includes the volumes up to 1867. James500 (talk) 15:10, 1 September 2020 (UTC)
        • @James500: Then a link to it should have been in the description already. I have added it and expanded the description as above. Feel free to add more details. Inductiveloadtalk/contribs 15:50, 1 September 2020 (UTC)
  •   Comment Periodical main namespace pages should not contain the curated information of scans, etc., that is the job of the Portal: namespace. Main namespace should only contain published information for works that we have prepared. So under your proposal, the main ns can exist, and it should contain contents of works that we have transcribed, and there should be a corresponding portal: or there can be a constructed Wikisource: project page where there is a project to do the work. This was discussed years ago, and we have been moving those constructs to portal namespace for years. If there is zero content at the page, and we are unlikely to have it, then it can be redlinked, or maybe if it is that obvious then we don't need a link at all, Examples would be useful. — billinghurst sDrewth 15:42, 30 August 2020 (UTC)
    • You are the only person moving these pages into the portal space. I would like to see a link to the alleged discussion you refer to. James500 (talk) 18:24, 30 August 2020 (UTC)
  • @Billinghurst: I personally don't see huge value in simply shunting just scan links to Portal and leaving them there:
    • It eventually leads to having two parallel volume lists, one with links and one without, sometimes with divergence.
    • It tends to end up with "scratchpad-level" content in Portal, which is supposed to be a nice presentation space.
    • Portals are badly integrated and will probably not be noticed by casual users, or even many Wikisource editors. Especially as the Portal headers never seem to actually link to the mainspace works that exist, but we can fix that.
  • I suggest Portals like Portal:Punch provide some useful value-add, whereas Portal:Notes and Queries does not (yet), and its current content, if anywhere, should be on a WikiProject, just on the mainspace talk page, or even nowhere now all the volumes are uploaded. If the consensus truly is to shunt this all to Portal and move back once there's content, then fine, but I do wonder if that's truly the most ideal strategy. From a pure "only reproduced content in mainspace" angle, perhaps, but does that serve readers best? Inductiveloadtalk/contribs
    @Inductiveload: Main namespace is content for the reader. There is nothing worse for a reader to go to a page and have to drill down multiple pages to find that there is no content just some dashed skeleton of hierarchy. Main namespace is not built to drive transcribers and transcriptions, that is our other content spaces. We can create a page there once we have content to display what we have to read, and point to the portal for what we have to transcribe. It is the reason we put in place the portal namespace. — billinghurst sDrewth 15:08, 31 August 2020 (UTC)
    I also wish to avoid the really ugly situation of people uploading a work, creating the front page, and then just leaving it for other people. That facadism of a work is just problematic, and we know that nothing happens to it. It is why we developed {{ext scan link}} and {{small scan link}} for use in the author namespace to do that role of managing that list build. So portal and author namespaces play that role and keep main namespace cleaner and more functional. — billinghurst sDrewth 15:15, 31 August 2020 (UTC)
  • @Billinghurst: I'm not say that we should be creating pre-emptive "empty" hierarchies. I'm saying that I don't really see the point of shunting all the scan links off to a portal where they will basically never be found by anyone who isn't extremely familiar with Wikisource and the mainspace/portal split. If a casual reader, is after, say, Volume 22 of The Atlantic Monthly, for which we have neither scans nor content, do we serve them better by placing a scan link to the IA on the mainpage next to the redlink so that can at least find what they wanted, or is better to have no redlink at all, skip Volume 22 in the list and maybe put the IA link at a portal? If the latter, I'm fairly certain 95%+ of people will just not find that link at WS. We can certainly adopt a stance of if it doesn't exist here, we don't even want casual readers to be presented with an external resource, but that seems slightly walled-gardenish for an open project.
  • "Facadism" is annoying, and it (or the perception of it) is what has brought us to this point via the proposals at WS:PD. As an example from that page, I don't find the concept of the page American Law Review intrinsically offensive in mainspace, even without any content (though perhaps it's a little untidy as-is), but I don't really see the point of American Law Review/Volume 1 as it stands (only a title page and redlinked TOC, though it's a single article away from being useful to me).
    • Notably, I find "facadism" of a collective work much less annoying than, say, only having the preface to a novel. Collective works can have individually-useful things slotted in bit by bit, and if there's a framework around the work, it's even easy to do.
  • And if we do want to ditch this proposal and be strict with Portals in this way, then 1) it needs to be documented that that's how it works (Wikisource:Portal guidelines and Help:Portals don't mention use of Portals for this purpose at all, they focus more on thematic curation) and 2) most existing periodicals need to be converted over: many people reasonably imitate of existing structures, we can't blame them for that.
  • And do we allow redirection from a non-existent mainspace page to the portal so it can be found via "normal" linking until such time as there is content? Inductiveloadtalk/contribs 17:09, 31 August 2020 (UTC)
  • The word "facadism" is needlessly offensive and should be deprecated in favour of something that doesn't sound like it refers to habitual dishonesty. I would urge that care be taken when coining neologisms to consider how these words might be taken. James500 (talk) 15:32, 1 September 2020 (UTC)
    What? It means that there is a face only. Nothing more. There is no offensive with it and I don't even see where you can draw that inference. You are digging to deep or looking for insult. Front-pageism is meh! So unless you can ind a better term can you please AGF. — billinghurst sDrewth 18:58, 1 September 2020 (UTC)
  •   Oppose I disagree with Inductiveload's position, and agree with Billinghurst's (provided I have understood them both correctly, which is not a certainty). We should significantly raise the bar in this area for mainspace pages, and anything that is not a (part of) an actual published work should be shunted to other namespaces. I acknowledge the downsides to that approach that Inductiveload brings up, but I think we should find other ways to ameliorate those. I also agree that the main purpose in setting a higher bar is to have a clear and predictable standard for contributors to aim for to enable keeping a work, with deletion being an admission of failure (i.e. deletion is a sometimes necessary, but never a desirable, outcome). I disagree that shunting content to other namespaces is a bad thing, as it is a great way to preserve content that would otherwise be deleted. Maintaining clear purposes for the namespaces makes possible technical innovation in the long term, through better integration with Wikidata and similar measures. --Xover (talk) 13:44, 22 November 2020 (UTC)

Bot approval requestsEdit

Repairs (and moves)Edit

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

Other discussionsEdit

PD-anon-1923 againEdit

The discussion of Happy Public Domain Day! has slipped into the archives without getting into some conclusion, so I would like to remind that the last suggestion in the above mentioned discussion was to create {{PD-US|year of death}} and deprecate {{PD/1923}} and {{PD-anon-1923}}. Is this solution OK?

BTW: if we decide to keep calling the license templates for pre-1925 works {{PD/1923}} and {{PD-anon-1923}}, it would be necessary at least to adapt the latter one so that it could be used for 1924 anonymous works too. --Jan Kameníček (talk) 16:21, 20 February 2020 (UTC)

  Support the change — I don't really care but it makes sense —Beleg Tâl (talk) 16:36, 20 February 2020 (UTC)
  •   Support likewise —Nizolan (talk) 01:54, 21 February 2020 (UTC)
  •   Oppose because the name emphasizes US. The point of the templates is to cover both US status and international status. A template that names the US will cause confusion, especially to newcomers. --EncycloPetey (talk) 02:02, 21 February 2020 (UTC)
    @EncycloPetey: So under your opinion, fixing a math wrong do even require consensus? Without consensus we should believe 1+1=3 rahter than 1+1=2? --Liuxinyu970226 (talk) 01:37, 1 April 2020 (UTC)
    Changes to established templates require consensus. We've had previous discussions and the community is divided on the issue concerning these templates. Proceeding with a change when the community has expressed such division is inappropriate because of the community discussion, not because of my opinion. --EncycloPetey (talk) 02:05, 1 April 2020 (UTC)
  •   Support. We are US-centric in our copyright approach. Given the number of times I've had to look up these type of templates here and on Commons, I might buy the idea that we should copy them, but otherwise, I think this is going to be as non-confusing as we get.--Prosfilaes (talk) 04:35, 21 February 2020 (UTC)
  •   Comment In your proposal, how do we code the year of the author's death for anonymous works? --EncycloPetey (talk) 04:38, 21 February 2020 (UTC)
    I am afraid I do not understand the question: anonymous works do not have any known author. I propose that for anonymous works we would have a template with similar wording as {{PD-anon-1923}}, but it would be called {{PD-anon-US}}. --Jan Kameníček (talk) 09:42, 21 February 2020 (UTC)
    That's also problematic, because the US is just one place that we display license information for. The current template displays that information for both the US and for countries with 95 years pma. --EncycloPetey (talk) 19:46, 21 February 2020 (UTC)

  Comment If there is a consensus to act, my recommendation is that we just move/rename the templates

  • pd/1923|yyyy -> PD-US|yyyy, yyyy=YoD, displays two templates as now
  • PD-1923 -> PD-US, where no $1 parameter it displays the one template
  • PD-anon-1923 -> PD-anon-US|yyyy, year of publication

and update the documentation around the place. Do any internal required tidying around internals of templates, and fixing double redirects. No need to deprecate anything, just move to the new nomenclature, and not worry about any of the old usage, or anyone continuing its use, as it matters not. — billinghurst sDrewth 11:15, 21 February 2020 (UTC)

  •   Oppose Firstly, because of the US emphasis. Yes, we follow US copyright law, but we also serve an international readership, not to mention contributors who are also bound by the copyright laws of other countries. Secondly, I think replacing "PD-1923" with "PD-US" is confusing. "PD-US" sounds like a generic template for "this work is PD in the US", but under this proposal it would mean "this work is PD in the US for the specific reason that it was published more than 95 years ago". BethNaught (talk) 22:16, 21 February 2020 (UTC)
    I do not understand in what way "the readership" is concerned in this… They see only the text of the template which is going to stay the same. --Jan Kameníček (talk) 23:08, 21 February 2020 (UTC)
      Comment I do not think that the suggested name of the template is more American-centred than the old one. E.g. {{PD/1923|1943}} has got two parts: "1923" is the American part referring to the American copyright laws, and the parameter "1943" is international referring to the countries where PD depends on the year of death. Nothing would change, only the American part would be called "US" instead of the nowadays non-sensical 1923, I really do not see any problem in that. --Jan Kameníček (talk) 23:08, 21 February 2020 (UTC)
    @BethNaught: The thing is that the only consideration we give to copyright compliance with regard to hosting is to the US copyright. Unlike Commons, we don't really care whether it is copyright in the country of origin. It is for this reason that I am reasonably comfortable with just stating PD-US and variants. The additional PD-old-70 and variants are for information only. — billinghurst sDrewth 00:43, 22 February 2020 (UTC)
  •   Comment I think this is an important issue, and I'd like to weigh in. I'm probably as familiar as (almost) any Wikimedian with the considerations around copyright law in various countries. But I do not see a clear statement of what the problem is that we're aiming to solve, or what the pros and cons are. I'm sure if I took an hour or two to dig through various archives, I could probably figure it out, but I'm not likely to have the time for that...nor should we expect every voter to do that. So given all that, I'm inclined to gently oppose, simply because I can't figure out what's going on, and it seems unwise to make a change that is difficult for community members to evaluate. Is it possible to sum up the issues more concisely so that I can give it more proper consideration, without having to do all the research myself? -Pete (talk) 22:44, 21 February 2020 (UTC)
    The problem I see is this: Until 1923 it made quite a good sense to have a template called PD-1923, because it referred to the fact that only pre-1923 works are in the public domain. However, the situation has changed, currently the time border is 1925-01-01 (or 1924-12-31) and it shifts every year. I perceive it as very confusing to call the template for pre-1925 works PD-1923 (why 1923???). At the same time it does not make sense to change the name of the template every year (PD-1923, …, PD-1925, …), it would be better to find a fitting universal name. --Jan Kameníček (talk) 23:16, 21 February 2020 (UTC)
    Ah, that's very helpful @Jan.Kamenicek:, thank you. I had misunderstood, I thought you were proposing a change to the functionality in addition to the name change.
    I agree that changing the name (a) such that it specifies "US" and (b) such that it references the 95 year rule, rather than the (now outdated) 1923 rule would be worthwhile. I agree with others that we should be cautious about US centrism; but the reality is, with a current title that assumes that it relates to US law, without stating it, we already have a high degree of US centrism in the title. In my view, it's better to state "US" as part of the name, to make it clear to editors (who are the primary audience for a template name) that it's about US law. So, my suggestion would be {{PD-US-95}} or similar. That conveys that it's about US law, and it's about the 95 year rule. Text on the template page/docs could clarify that the 1923 rule is now outdated, and subsumed under the 95 year rule.
    A related issue that I find confusing: I don't understand why we need two separate templates for {{PD-1923}} and {{PD/1923}}. I think this proposal only relates to the latter; would we be leaving PD-1923 intact? A decision on this is probably a matter for a separate discussion, but I'd like to know for sure what the intent of this proposal is. -Pete (talk) 23:45, 21 February 2020 (UTC)
    PD-1923 has no decision-making applies just a single template, it does not add the PD-old-nn variants. It has been utilised where we have been unable to determine a date of death, or for corporate publications which do not have PMA decisions. I addressed above that they would morph into PD-US, though we would need to handle them as parameterless. — billinghurst sDrewth 00:51, 22 February 2020 (UTC)
    Jan, that's not quite correct. Works published before 1923 are still in PD in the US for the same reason they were before. The 1923 date was a cutoff date beyond which we have never had to check. What has changed is that works that were under copyright later than that (from 1923 and 1924), and had their copyright renewed at one point, have now had that copyright protection expire. The works published before 1923 were not eligible for renewal and entered PD for a different reason than the works published in 1923 and 1924. It is one view to see the date as a shifting cutoff, but the cause of works from 1923 and 1924 entering public domain is actually different from those that were published prior to 1923. --EncycloPetey (talk) 03:13, 22 February 2020 (UTC)
    All works published more than 95 years ago are out of copyright because of the time since publication, no matter whether that's due to copyright notices, or renewals, or being in copyright for a full long term. For a work published before 1923, we've never been concerned about copyright notices or renewals, nor how long work published with copyright notice and renewal got in copyright. Why does it matter that a work published in 1924 may have got 95 years of copyright, whereas a work published in 1922 may have only got 75, when we don't really care about that 95 or 75 in the first place? We have no tag for "published abroad before non-US works got copyright in the US in 1891", because we don't care; it has always been sufficient for our purposes to say that it was published before 1923, and I don't see why it is not now sufficient to say that it was published more than 95 years ago.--Prosfilaes (talk) 04:59, 22 February 2020 (UTC)
    @Prosfilaes: I am presuming that this is in reference to the primary notice about copyright within the US, not the secondary notice for PD-old-nn which relates to copyright elsewhere in the world. The secondary notice can still apply for those of us not in the US, which is why we added it. — billinghurst sDrewth 05:08, 22 February 2020 (UTC)
    Yes, the primary notice. There's no need to worry about now-historical features of non-US countries, but certainly helpful to list the years since death.--Prosfilaes (talk) 05:18, 22 February 2020 (UTC)
    Yes and no. There are authors who have works published prior to 1925 who died late enough to still have works in copyright in their home country, so those notices are still very pertinent per Category:Media not suitable for Commons. — billinghurst sDrewth 05:30, 22 February 2020 (UTC)
    Right; I didn't mean to imply we should change the current secondary notices.--Prosfilaes (talk) 06:42, 22 February 2020 (UTC)
  •   Support U.S. copyright is of primary concern to Wikisource. Fixing the license so more 1923 and 1924 works appear on Wikisource even if still under copyright in other countries is so important. Abzeronow (talk) 19:46, 16 March 2020 (UTC)
  •   Support as this seems like the least problematic solution to the problem, and it doesn't make sense for us to keep delaying a resolution. Kaldari (talk) 18:09, 14 April 2020 (UTC)
  •   Comment It looks as though some people are hedging their bets: arguing for deprecating the template on the one hand but arguing for improving the template on the other. Since the template content has now changed, before this discussion has concluded, then proceduraily we should recast all votes, since the template named in this discussion thread no longer has the content it had at the start of this discussion. --EncycloPetey (talk) 20:42, 24 April 2020 (UTC)
    Hedging their bets? It is somehow improper to try and improve Wikisource for now, whether or not this template gets deleted? If we're going to get pedantic about policy, where is it written on the English Wikisource that we should recast all votes?--Prosfilaes (talk) 06:41, 25 April 2020 (UTC)
    No need to restart the votes, as the changes have been reverted. The template is the same as it was before the voting started. No changes should be made to any template if there is a discussion and voting ongoing about its future. If the changes were allowed and at the same time we would have to restart the voting after every change, we may never come to a conclusion; not everybody has time to vote about the same problem again and again. --Jan Kameníček (talk) 09:50, 25 April 2020 (UTC)
  •   Support If there must need a consensus to fix math wrongs, let it be. --Liuxinyu970226 (talk) 09:01, 7 May 2020 (UTC)
  •   Comment Please note that the new date, 1925, applies to all works except sound recordings (and maybe architecture). The date for sound recordings is 1923. That isn't shown in the local summary of the Hirtle chart, but is in the original. (I dropped a more detailed comment below.)--Sphilbrick (talk) 14:29, 20 July 2020 (UTC)
    Interesting point. If it is really so and if we need to show a license for sound recordings somewhere, we would probably have to create a specialized template for them.--Jan Kameníček (talk) 11:44, 2 December 2020 (UTC)
    Yeah. Sound recordings have a tortured history in US copyright law, but the end point is that the first recordings to have their copyright expire in the US will be in 2022, for those published before 1923. See w:Public_domain_in_the_United_States#Sound_recordings_under_public_domain.--Prosfilaes (talk) 00:51, 3 December 2020 (UTC)

So it seems to me that there is a weak consensus for the change. If so, it might be better to make it before the end of the year, so that works newly entering public domain can already be added with new templates.

The less important change is renaming the templates from {{PD/1923|year of death}} and {{PD-anon-1923}} for {{PD-US|year of death}} and {{PD-anon-US}}. It is only a change of the names of the templates, what the readers see will not be affected by this.

The more important change is adapting the latter one so that it automatically counted the years as {{CURRENTYEAR}}-95, similarly as it has been done e. g. here.

--Jan Kameníček (talk) 11:44, 2 December 2020 (UTC)

Policy on substantially empty worksEdit

[This is imported from WS:PD, where it applies to multiple current proposals, and several other works].

We have quite a few cases of works that are "collective" or "encyclopaedic" in that they comprise many standalone articles of individual value, which are basically just "shell pages", with no substantial content of any sort, not even imported scans or Index pages. For example, and this isn't intended to make any statement about these specific works, they're just examples and they may well get some work done soon during their respective WS:PD discussions:

Based on the usual rate of editing for things like that, unless dragged up into a process like WS:PD, they'll remain that way a very, very long time. I think it is perhaps there might be a case to host a mainspace page for this work, even though there is zero, or almost zero actual content. Do we want:

  • Mainspace pages where this is a tiny bit of information like header notes, scan links and maybe detective work on the talk page (not in this case). This provides a place for people to incrementally add content. Also gives "false positive" blue links, since there is actually no "real" content from the work itself, or
  • Do not have a mainspace page until there's some content. Only host this in terms of scan links author/portal scan links, much like we do for something like a novel.

Personally, I lean (gently) towards #2, but with a fairly low bar for how much content is needed. Say, Indexes, basic templates, a title page and one example article. Ideally, a completed TOC if practical, especially for periodical volumes/numbers. It is fair to not wish to transcribe entire volumes of these work, it is fair to not want to import dozens of scans when you only wanted one, it is fair to only want an article or two, but it's not fair, IMO, to expect the first person who wants to add an article to have to do all the groundwork themselves, despite having been lured in with a blue link. That onus feels more like it should be on the person creating the top-level page in the first place.

I do see some value in periodical top pages with decent lists of volumes and scans where known, because these are often tricky and fiddly to compile from Google books/IA/Hathi, so it's not useless work, even if there are no imported scans (though imported is better than not).

We currently have a large handful of collective works listed for deletion right now in various levels of "no real content", and, furthermore, every single periodical that gets added can fall into this situation unless the person who adds, so I think we could have a think about what we really want to see here. Inductiveloadtalk/contribs 15:43, 3 July 2020 (UTC)

  • I believe that, if there is no scan as an Index: page, the main-namespace page should not exist unless it is being actively completed or is already mostly completed. A few pages (of the volume itself) is not very helpful, and is entirely useless if their is no scan given. TE(æ)A,ea. (talk) 15:59, 3 July 2020 (UTC).
  • I think such preparatory information would ideally be on more centralized WikiProject pages (for the broad subject), both for clarity and to assist in keeping different efforts consistent -- but that it certainly should be retained as visible to non-admins. I think that the red vs blue link issue is minor (but not totally negligible) and outweighed by the disadvantages of hiding the history of previous efforts. I strongly encourage redirecting such pages to appropriate WikiProject pages (after copying over the details there). JesseW (talk) 18:11, 3 July 2020 (UTC)
  • @JesseW: I agree that history shouldn't be deleted, but I think we should approach this in terms of what we want to see from these works, rather than what to do with the handful of examples at PD. There are hundreds of periodicals we could have but don't, and this applies to those as well. If we can come to a conclusion about what is and isn't wanted, we can make all the deletion requested works conform to that easily enough. Inductiveloadtalk/contribs 20:55, 3 July 2020 (UTC)
  • I think these pages are necessary to list index pages and external scans of multi-volume works (such as encyclopaedias and periodicals) especially if they are wholly or partly anonymous or have many authors or are simply large. I think it makes no difference whether such pages are in the mainspace, the portal space or the project space (except that it is harder to find pages outside the mainspace). The point is that these works often have so many volumes (often dozens or hundreds) that they must have their own page, and cannot be merged into a larger portal or wikiproject. If the community starts insisting on index pages, what will happen is the rapid upload of a large number of scans for the periodicals that already have their own page. Likewise if the community insists on transclusion. I also think it is reasonable to have a contents page in the mainspace, as it allows transclusion of articles. Most importantly, new restrictions should not immediately apply to existing pages that were created before the introduction of the restrictions. This is necessary to prevent a bottleneck. James500 (talk) 23:55, 3 July 2020 (UTC)
move the works to a maintenance category, and i will work them; delete them and i will not: i find your sword of Damocles demotivating. Slowking4Rama's revenge 01:55, 5 July 2020 (UTC)
@User:Slowking4: I am not proposing a sword of Damocles. I agree that the imposition of deadlines is counter-productive. I do not support the deletion of any of these pages. I would prefer to see them improved. James500 (talk) 04:38, 5 July 2020 (UTC)
TEA is on his usual deletion spree. not a fan. will not be finding scans to save texts, any more. he can do it. Slowking4Rama's revenge 00:15, 6 July 2020 (UTC)
The entire point of moving this here, and not staying at WS:PD is to decouple from the emotions that get stirred up in a deletion discussion. Let's keep deletion out of this. If we come up with some idea of what we do and don't want, then we can go back to WS:PD and decide what to do. I imagine that all that will be needed will be a fairly limited amount of housework to bring those works up to some standard that we can decide on here, and all the collective works there will be easy keeps. Hopefully with some kind of consensus that we can point at to outline a minimum viable product for such works going forward. There are hundreds and thousands of dictionaries, encyclopedias, periodicals and newspapers that we could/will, quite reasonably, have only snippets of. How do we want to present them? What, exactly, is the minimum threshold? Let's head of all those future deletion proposals off at the pass, because deletion proposals often cause friction. Inductiveloadtalk/contribs 00:47, 6 July 2020 (UTC)
and yet deletion is the default method to "motivate" quality improvement. i reject your assertion that "emotions get stirred in a deletion discussion", rather, anger is a valid response to a repeated broken process being kicked down on the volunteers. it is unclear that a minimum threshold is necessary, rather a functional quality improvement process is. until we have one, you should expect to see this periodic stirring of emotions, as the non-leaders act out. Slowking4Rama's revenge 11:53, 9 July 2020 (UTC)
@Slowking4: Thank you for presenting this opinion, and I'm sorry if I have not made myself clear. We do need to figure out how to avoid a de-facto process of using WS:PD as an ill-tempered ad-hoc venue for "forcing" improvements on people who have somehow managed to generate works that are so in need of improvement that another user has nominated them for deletion. Please also consider looking at #Re-purpose_WikiProject_OCR_to_WikiProject_Scans for an idea to have a "functional quality improvement process" to which such works could be referred upon discovery rather than kicking them straight to WS:PD. If you have other ideas or you have previously suggested something similar to address these frustrations, you could detail them there. Personally, I think we should always prefer improvement over deletion. Exactly what the remediation is (refer to a putative WP:Scans, WS:Scriptorium/Help, directly WS:PD as now, or something else) is not what this thread is for. This thread is for discussing, what, if anything, should be the tipping point for deeming a page "lacking" and doing something about, whatever "something" is. I don't think I can be much clearer that this is not about deletion. If we also have a better venue for improvements, then that's even better.
For example, my personal feeling and !vote on A Critical Dictionary of English Literature is "keep and improve", despite it lacking scans or even links to scans, having only one article and no other content, not even a title page: in short, failing almost every criterion suggested so far in this thread. The only thing it does have is have is good text quality of the one entry. I personally do not think this work should be deleted, but I do think it should be improved in specific ways. The first half of that sentence is not the focus of this discussion, the second half is. Inductiveloadtalk/contribs 14:18, 9 July 2020 (UTC)
deletion threat has been an habitual method of communicating by admins since the beginning of the project. and text dumps have been habitual following in the guttenberg example. culture change and process change would be required to change those behaviors. we could may it easier to start scan backed works, but the wishlist was not supported. Slowking4Rama's revenge 21:00, 14 July 2020 (UTC)

I don't think this needs to be much of an issue going forward -- we all agree that it's OK to create Index pages for scans, even if none of the Pages have been transcribed yet; so the only case where this would come up is recording research where no scan has yet been identified as suitable to be uploaded. And for that, I still think a WikiProject page is the right location, not mainspace. (Or, if you must, your userpage.) JesseW (talk) 00:59, 6 July 2020 (UTC) I realized I may not have been clear enough here -- in my view, the ideal process goes like this:

  1. Decide on a work you are interested in (in this case, a periodical/encyclopedic one) -- don't record that anywhere on-wiki (except maybe your user page)
  2. Find and upload (to Commons) a scan of one part/issue/etc of the work.
  3. Create a ProofreadPage-managed page in the Index: namespace for the scan. (You can stop after this point, without worry that your work will later be discarded.)
  4. EITHER
    1. Put further research (on other editions, context, possible wikification, etc.) on that Index_talk page.
    2. Proofread a complete part of the scan (an article from the magazine issue, a chapter from the book, a entry from an encyclopedia, etc.) and transclude it to the mainspace (and create necessary parent pages), and put the further research on the Talk: page of the parent mainspace entry.

If you can't find any scan, and don't want to leave your working notes on your user page, put them on a relevant WikiProject's page.

If you come across such research done by others and misplaced, follow the above process to relocate it to an appropriate place, then redirect the page where you found it to the new location. That's my proposal. JesseW (talk) 01:08, 6 July 2020 (UTC)

@JesseW: It's not clear to me in your above whether when you use the term "index" you refer to a ProofreadPage-managed page in the Index: namespace, or a general wikipage in the main namespace on which an index-like structure (and/or a ToC, or similar) is manually created. Could you clarify? --Xover (talk) 05:14, 6 July 2020 (UTC)
I meant the namespace. Clarified now. JesseW (talk) 05:17, 6 July 2020 (UTC)
  • Hoo-boy. Y'all sure know how to pick the difficult issues…
    My general stance is that: 1) scans and Index: (and Page:) namespace pages have no particular completion criteria to meet to merit inclusion, and can stay in whatever state indefinitely (there may be other reasons to get rid of them, but not this); and 2) the default for mainspace is that only scan-backed complete and finished works that meet a minimum standard for quality should exist there.
    That general stance must be nuanced in two main ways: 1) there must be some kind of grandfather clause for pre-existing pages; and 2) there must exist exceptions for certain kinds of works that meet certain criteria. I won't touch on the grandfather clause here much, except to say I'm generally in favour of making it minimal, maybe something like "No active effort to get rid of older works, but if they're brought to PD for other reasons they're fair game". The design of a grandfather clause for this is a whole separate discussion, and an intelligent one requires analysis of existing pages that would be affected by it. It is always preferable to migrate pages to a modern standard, so a grandfather clause is by definition a second choice option.
    Now, to the meat of the matter: the exceptions…
    We have a clear policy to start from: no excerpts. Works should either be complete as published, or they should not be in mainspace. But quite apart from the historical practices that modify this (which are somewhat subjective and inconsistent, so I'll ignore them for now), there are some fairly obvious cases that suggest a need for more nuance than a simple bright-line rule alone provides. The major ones that come to mind are: 1) massive never-completed projects like EB1911 or the New York Times (EB because it's big; NYT because new PD issues are added every year); 2) compilations or collections of stand-alone works with plausible claim to independent notability.
    For encyclopedias and encyclopedia-like things, we have to accept some subsets due to sheer scale of work. But when that is the grounds for exception, there needs to be some minimum level of completion. I'm not sure I can come up with a specific number of pages/entries or percentage, but it needs to be more than just a single entry (and, obviously, only complete entries). For this kind of exception to apply, I think it needs to be a requirement that the framing structure for it is complete: that is, the mainspace page should give a complete overview of the relevant work even if most of it is redlinks. That includes title pages and other prolegomena when relevant. For a periodical like the NYT, that means complete lists of issues with dates and other such relevant information (e,g. name changes etc.). For preference, these kinds of things should be in Portal: namespace or on a WikiProject page until actually complete, but that will not always be practical (EB1911 and NYT are examples of this). Mainspace or Portal:-space should never contain external links (i.e. to scans) or links to Index: or Page: space (except the implied link of transclusion and the "Source" tab in the MW UI provided by ProofreadPage).
    For exception claimed under independent notability there are a couple of distinct variants.
    Newspaper or magazine articles need to have a certain level of substance in addition to a specific identifiable byline (possibly anonymous or pseudonymous, and possibly identified after the fact by some other source, such as the Letters of Junius) in order to qualify. It is not enough to ipso facto be a newspaper article, a magazine article, a poem, or an encyclopedia entry. On the one hand we have things like dictionaries and thesauri, where an entry could be as little as two words. Or a one-sentence notice without byline in a newspaper. Or two rhymed lines (technically a poem) within a 1000-page scholarly monograph.
    To merit this exception it should be reasonable to argue that the "work" in question should exist as a stand-alone mainspace page (not that we generally want that; but as a test for this exception, it should be reasonable to make such an argument). This would clearly apply to moderately long entries in the EB1911 written by a known author that has their own Wikipedia article. It would apply to short stories or novella-length serialisations in literary magazines by authors that have later become famous (or "are still …"). It would apply to various longer-form journalistic material from identifiable journalists (again, rule of thumb is notable enough for enWP article), including things in magazines that have similar properties. For most periodicals the most relevant atomic (indivisable) part is the issue not the entry or article, but with some commonsense exceptions.
    It would, generally, not apply to things that are works by a single author, like a scholarly monograph that just happens to be arranged in "entries" rather than chapters. It would not apply to things that are essentially lists or tables of data. It would not apply to short entries in something encyclopedia-like or entries that are not by an identifiable author. The OED for example, iirc, is a collective work where entries are by multiple not individually identifiable authors (and each entry is mostly very short too); only the overall editor is usually cited.
    For works claiming this exception too the framing structure should be complete, even if most of it are redlinks. The same general rules about Portal:/WikiProject and no external or Index:-space links apply. An exception would be for periodicals where new issues enter the public domain every year; and we should generally avoid including even redlinks for the non-PD issues here (but may allow them in a WikiProject page). For non-periodical works in multiple volumes where some volumes were published after the PD cutoff, including listings for the non-PD volumes (but not links to scans; those are a copyvio issue) is ok.
    Poems, short stories, and novellas are a special class of works here. A lot of these were first published in a magazine (possibly serialized), and a lot of them exist as multiple editions in substantially the same form. Some exist in multiple versions. These should all primarily exist the same way as chapters as part of their various containing works; but there are some cases where we might want to have, for example, a series of connected pages of the poems of Emily Dickinson. I am significantly ambivalent about this practice, as it amounts to making our own "edition" or "collection" of her poems (in violation of several of our other policies), but I acknowledge that it is an established practice and it is something that has definite value to our readers. It may be that it is actually a practice that should be governed by its own dedicated policy rather be attempted to be handled within these other general policies.
    For the sake of example; applying this to the works Inductiveload listed at the start of this thread would shake out something like this:
    Auction Prices of Books—This work appears to have no sensible subdivisions and is in any case by a single author. I see no obvious reason to grant this work an exception, except under sheer volume of work and even there I would want to see both a substantial proportion completed and some kind of ongoing effort towards completion (no particular time frame, but definitely not infinite and definitely not as an effectively abandoned project). In a deletion discussion I would very likely vote to delete the mainspace pages here (but, as nearly always, to keep the Index: and Page: namespace artifacts). I don't see this as a reasonable candidate for a Portal:, nor really a good fit for a WikiProject (though I probably wouldn't object to a WikiProject if someone really wanted one).
    Central Law Journal/Volume 1—A single volume is too little, so I would want to see a complete structure for the entire Central Law Journal, with level of detail for each volume similar to the one existing volume. Each article in the journal can be individually considered for a stand-alone work exception; but for the collection I would want to see at minimum a full issue finished to justify having the mainspace structure, and preferably multiple issues (in a deletion discussion I might insist on multiple issues). Index: and Page:-space artefacts can, of course, stay. A Portal: might make sense for selections from the journal, of articles that meet the standalone work exception. A WikiProject to coordinate work and track links to scans etc. might be a decent fit here, if someone wanted that. As it currently stands I would probably vote delete for the mainspace artefacts (with option to move whatever content has reuse value to a non-mainspace page for preservation; and undeleting if someone wants to work on something is a low bar).
    A Critical Dictionary of English Literature—The top level mainspace page has near-zero value, existing only to link to the single transcribed entry. For a credible claim to exception to exist it would need to be a complete framework for the work as a whole, and significantly more than a single entry must be complete. I would probably also want to see ongoing work, unless a substantial percentage of the entries were complete. The single finished entry is eligible to claim a standalone work exception, but I think it probably would not meet my bar for that (I might be wrong; and the rest of the community might judge it differently). In a deletion discussion I would probably vote to delete all the mainspace artifacts here (as always keeping Index:/Page: stuff) but with a definite possibility that I might be persuaded on the one completed entry (an absolute requirement for convincing me would be to scan-back it: as a separate issue, my tolerance for grandfathering of non-scan-backed works is small, and effectively zero for new/non-grandfathered works).
    Bradshaw's Monthly Railway Guide—Would need a full framework and a number of individual issues finished to merit a mainspace page. I see no credible subdivisions for a standalone work exception, but might be persuaded otherwise if, say, one of the train tables was used as a (reliable primary) source in a Wikipedia article (implying some sort of notability beyond just being raw data). In a deletion discussion I would probably vote to delete all mainspace artifacts here. If anyone made the argument, I would entertain the notion that there is value in treating train tables like poems, and hosting a series of train tables like we do Dickinson's poems; but that would require a substantial number of them completed.
    For everything above my stance is nuanced by a willingness to accept temporary exceptions for things that are actively being worked: active being operative, but with no particular deadline to complete the work. We have differing amounts of time available, and some works are so labour-intensive or tedious to do, that my person threshold for "active" is a pretty low bar to clear. If it's months and years between every time you dip in and do a bit I might start to get antsy, but days or weeks probably won't faze me. And that the projected time to completion is very long at that pace is not particularly a problem so long as it is not infinite. Within those parameters I would always tend to err on the side of letting contributors just get on with it in peace, regardless of any of the policy-like rules sketched above.
    I also want to emphasise that I think this is a very difficult issue to deal with. There are a lot of competing concerns, and a lot of grey areas that will likely take individual discussions to resolve. My balance point on this issue is partly formed by a broader concern about our overall quality (we have waay too many works of plain sub-par quality, and too many not up to modern standards) and a hope that by preventing the creation of these kinds of works (rather than deleting them after creation) we will be able to retain the good and desirable exceptions without dragging down quality, and without the traumatic and stressful events that deletions and proposed deletion discussions are.
    And for that very reason I am grateful this issue was brought up here for discussion, and I hope we can end up with some clear guidance, possibly in the form of a policy page, going forward. And in any case, since it will create de facto policy, this is a discussion that needs to stay open for a good long while (there are several community members that have not yet commented whose opinion I would wish to hear before closing this), and depending on how well we manage to structure the consensus, may also require a formal vote (up in the #Proposals section). --Xover (talk) 09:03, 6 July 2020 (UTC)
  •   Oppose. It is becoming clear that a policy on incomplete works in the mainspace is going to place enormous pressure on individual editors. I think it would be more effective to start a wikiproject devoted to scan-backing works that lack scans and so on. James500 (talk) 12:14, 6 July 2020 (UTC)
    • @James500: FYI, this thread was made in order to provide an exception to the current policy of "no excerpts". A literal reading of the policy as it stands has a plausible chance of coming down delete on the mainspace pages over at WS:PD. This thread is a chance to come up with a better way to support such partial collective works. That we have several substantially incomplete and abandoned collective works lolling around in mainspace is actually the result of laxity in respect to stated policy (not to say I think it's a bad thing). The deletion proposals, whatever you may think of them, are actually not in contradiction to policy. That said, as always, there is scope to adjust policy. Which is what this is.
    • Now, in terms of a WikiProject to scan back works, I think that is a good idea. See #Re-purpose_WikiProject_OCR_to_WikiProject_Scans above, which proposed to reboot Wikiproject OCR as a scan-backing Wikiproject. Inductiveloadtalk/contribs 14:40, 6 July 2020 (UTC)
      • The policy says "When an entire work is available as a djvu file on commons and an Index page is created here, works are considered in process not excerpts." A literal reading of that policy is that no scan-backed work is an excerpt (it is expected to be completed eventually). Further the policy refers to "Random or selected sections of a larger work". A literal reading of that expression is that it does not include lists of scans, or auxilliary content tables, as they are not "sections" (they are not part of the work), and that not every incomplete portion of a work is either "random or selected" (which would not include starting from the beginning and getting as far as you can, with intent to finish later). I could probably argue that an encyclopedia article or periodical article is a complete work. James500 (talk) 15:16, 6 July 2020 (UTC)
  • Nice wall of text, Xover (and I say that with great respect!) -- it generally makes sense and sounds good to me. As another hopefully illustrative example, take The Works of Voltaire, which I've been digging thru lately. I think this would very much satisfy your criteria as a large work, with sufficient scaffolding to justify the mainspace pages that exist for it. I would love to hear others thoughts on that. JesseW (talk) 16:07, 6 July 2020 (UTC)
    @JesseW: Yeah, apologies for the length. Brevity is just not my strong suit.
    The Works of Voltaire probably qualifies on sheer scale of work, yes. I don't think the current wikipage at The Works of Voltaire is quite it though: as it currently stands it is more WikiProject than something that should sit in mainspace (its contents are for Wikisource contributors, to organise our effort, not our readers, who want to read finished transcriptions). It also mixes a work page with a versions page in a confusing way. So I would probably say… Move the current page to Wikisource:WikiProject Voltaire; create a new The Works of Voltaire as a pure versions page, linking to…; The Works of Voltaire (1906), that is set up as a work page with the cover and title (and other relevant front matter) of the first volume, and an AuxTOC (and possibly also the {{Works of Voltaire}} volume navigation template). I don't know how tightly coupled the volumes of this edition are (does the first volume have a common ToC or index of works for all the volumes?), so some flexibility on format may be needed to make sense. But as a base rule of thumb it should start from a regular works page and deviate only as needed to accommodate this work (mainly the size is different).
    In any case… With a volume or two completed (they're only ~350 pages each) I'd be perfectly happy having something like that sitting around. With less then that I'd possibly be a bit more iffy, but it's hard to put any kind of hard limit on that. And with somebody actively working on it I'd be in no hurry whatsoever regardless of current level of completion.
    PS. I'm pretty sure a large proportion of the contents of these volumes are works that would qualify under "standalone works" that could exist independently in mainspace, regardless of what's done with the The Works of Voltaire page. Even his individual poems and essays can presumably make a credible claim here (because it's Voltaire; less famous authors would have a higher bar). Better as part of the edition, but also acceptable on their own. --Xover (talk) 16:56, 6 July 2020 (UTC)
  • @JesseW: I personally take no issue with this page's existence (actually I think it's a nice work and good way to allow an important author's works to be slotted in piece-by-piece. I have some general comments which overlap with this thread (written before Xover's reply, so pardon overlap):
    • First off, I differ with Xover in terms of the scan links: I think they're better than nothing, and I don't see much value in duplicating the volume list onto an auxiliary page just to add scan links. However, I can sympathise with the sentiment that our mainspace shouldn't direct users off-wiki (or at least off-WMF). But if we don't have the scans, and that's what the user wants, they're leaving anyway. Real answer: import moar scans!
    • No scan links are necessary where the volume exists in mainspace and is scan-backed (e.g. v3)
    • Ext scan links should only be used when there is no Index page or imported scan. Use {{small scan link}} or {{Commons link}} when possible (e.g. v2)
    • The first volume list could probably be in an AuxTOC to mark it out as WS-generated content.
    • The "Other editions" section belongs on an auxiliary namespace page (Talk, Portal or Wikisource). I suggest the Talk page is best in this case. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
  • @Xover: I am in agreement with the majority of what you say. Particularly, I think a framework around any collective work (be it a single-volume biographical dictionary or a 400-issue literary review spanning 80 years) is the critical prerequisite, plus at least some scans, the more the merrier. Where I think I differ:
    • I am inclined to be a bit more relaxed in terms of how much of a work we need. As long as a single article exists, it's not "trivial" (e.g. only a short advert or some incidental text like a "note to correspondents", as opposed to an actual article), it's well-formatted and scan-backed, and a complete framework exists, including front matter and a TOC, such that's it is easy for anyone to slot in new pieces, I'd be fairly happy. Lots of periodicals have all sort of tricky bits like tables of stocks or weather tables and writing into policy that those must be proofread in order to get the "real" articles into mainspace would be a chilling effect, in my opinion. If you allowed an exception, it would be verbose and tricky to capture the spirit without saying "unless, like, it's totally, like, hard, man".
    • I am not dead against scan links in the mainspace at the top level, when such a top-level page exists. See my comments on Voltaire above. I am against them where they could sensibly be on an Author page and they are the only mainspace content.
    • I am ambivalent on the presence of, e.g., disjointed train timetables. It's not my thing to have a smattering of random timetables, but as long as they're individually presented nicely, it's not too offensive to my sensibilities. I might question the sanity of someone who loves doing tables that much, but whatever floats the boats! Also, I think that this might circle back to "good for export" - a mark which certainly would require completed issues or volumes. If you want to get that box ticked, you have to do it all.
    • Re the "notability" aspect of individual articles, I'm not really bothered by that, as I don't think we'll see a flood of total dross because few people really want to take the time to transcribe 1867 articles about cats in a tree from the Nowhere, Arizona Daily Reporter, and, actually I think some of the "dross" can be quite interesting in a slice-of-life kind of a way (always assuming well-formed and scan-backed). And the real dross is usually so bad (no scans, raw OCR, etc) that it can be dealt with outside of this topic. I think part of the value of WS is the tiny, weird and wonderful, not just in blockbusters like War and Peace and Pultizers. I think I might like to see more of our articles strung together thematically via Portals, but that's another day's issue. Inductiveloadtalk/contribs 17:35, 6 July 2020 (UTC)
      • @Inductiveload: We appear to be mostly in agreement. But… instead of me dropping another wall of text on the remaining points of disagreement, maybe that means we're in a position to try to hash out a draft guidance / policy type page with the rough framework? Then we could go at the remaining issues point by point. Because I think I'm in with a decent chance to persuade you to my point of view on at least some of them, but this thread is fast getting unwieldy (mostly my fault). It would also probably be easier for the community to relate to now, and much easier to lean on in the future. --Xover (talk) 18:31, 6 July 2020 (UTC)
        • @Xover: If there are no more comments forthcoming after a couple of days, I think that makes sense. I don't want to railroad it: considering we have at least one !vote for "do nothing", I'd like to see if there are any other substantially different opinions floating about. Inductiveloadtalk/contribs 17:41, 7 July 2020 (UTC)

The quantity of text here has grown far faster than my ability to absorb it, so rather than continue to put it off, here's my position: I don't see any problem with transcriptions that are scan-backed, even if the transcription only covers a small fraction of the entire scan. If Sally chooses (say) to transcribe a favorite story, that happened to be published in an issue of Harper's back in the 1890s, and goes to the trouble of uploading the full issue, but only creates pages for the one story that interests her, I think that's great. It doesn't matter to me whether she intends to work on the other pages or not. If it's not scan-backed, but it's fairly high quality, I am personally willing to do some work trying to locate a scan and match it up to the text; I'd rather we take that approach, than deletion, though of course deletion is the better option in some cases where the scan is very hard to come by.

If all this has been said above, or if I've misunderstood the topic, my apologies. Please take this comment or leave it, as appropriate. -Pete (talk) 02:00, 8 July 2020 (UTC)

Apologies, I see I had missed the point.

I disagree with Xover's statement that a top-level page for a publication, with a link only to a single article within the publication, has "near-zero value." Such a page can serve an important function linking content together in ways that help the reader (and search engines) find the content they're looking for, or understand the context around it. For instance, A Critical Dictionary of English Literature is linked from the relevant Wikidata entry. The banner on the Wikisource page clearly tells a Wikisource reader that they won't find a full transcription here; and with a simple edit, it could link to a full scan on another site, or (with perhaps a little more effort) even transcription links here on Wikisource. This page has been here since 2010; we don't have any way of knowing what links might have been created elsewhere in the intervening decade. (I do think that new pages like this should not be created without a scan at Commons to be linked to.) -Pete (talk) 02:12, 8 July 2020 (UTC)

I'm really bad with walls of text, so I have only read a tiny portion of the above discussion. But I want to mention a couple of things that I think are worth considering in this discussion.
  • Most of the time, a mainspace "work" that is only a table of contents, but which has none of the actual content, and is not actively being worked on, can be (and should be) deleted as No meaningful content or history under our deletion policy.
  • A mainspace work that has only a little bit of content, but that content is a work unto itself within the scope of Wikisourse, should be kept. Most periodicals are like this. For an example, see the Journal of English and Germanic Philology which only has one hosted article, but that hosted article is scan-backed and firmly within scope.
  • On some occasions, empty mainspace works do have value. I ended up creating the page The Roman Breviary, depsite containing no actual content, mostly because there are a lot of works that link to it, using many different titles, and if someone uploaded a copy of the work under one title then many of the links would remain red because they point to different titles of the work. This could be easily solved by creating redirects to a simple placeholder page, so I did. I tried to make the placeholder page as useful as a placeholder page can be, as it contains useful information about the history and authorship of the work, and links to the Index pages where the transcription will take place.

Anyway those are my 2 cents, sorry if they are redundant —Beleg Tâl (talk) 00:40, 29 July 2020 (UTC)

ProposalEdit

Since there has been no extra input for a month, and not wanting this section to get archived without at least attempting a proposal, I have started a proposal #Collective work inclusion criteria above. Inductiveloadtalk/contribs 11:00, 25 August 2020 (UTC)


I've created Bradshaw's Monthly Railway and Steam Navigation Guide (XVI) - it couldn't be done on one page, due to the very high number of template transclusions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 1 September 2020 (UTC)

Call for feedback on archiving POTUS tweetsEdit

I would appreciate hearing the community's thoughts on archiving Presidents Trump's communications to the public via tweeting.

If you are new to the topic of the status of POTUS tweets, this article from NPR is a good introduction which happens to namecheck Wikipedia while discussing crowdsourcing of Presidential records.

My take is that post-11/3/2016 tweets from the @realDonaldTrump account - even those that have been subsequently deleted - are official Presidential records within the scope of being archived here. Here is why I believe this:

  • The Presidential and Federal Records Act was amended in 2014 to expand the definition of records to electronic content, including social media communications. The Obama administration complied with this by auto-archiving Obama's posts made from the @POTUS twitter account, and publishing a searchable archive of those tweets shortly before he left office. link
  • Trump's press secretary said on June 6, 2017, when asked whether POTUS tweets are official statements: "The President is the President of the United States, so they're considered official statements by the President of the United States."
  • Trump affirmed that he considered tweeting part of his presidential duties in July 2017 when he tweeted that "My use of social media is not Presidential - it's MODERN DAY PRESIDENTIAL."
  • This issue of the status of deleted POTUS tweets was asked about in this letter from two U.S. Senators to the Archivist of the United States. The Archivist responded that the National Archives and Records Administration "...has advised the White House that it should capture and preserve all tweets that the President posts in the course of his official duties, including those that are subsequently deleted, as Presidential records, and NARA has been informed by White House officials that they are, in fact, doing so." link
  • On March 15, 2018 Secretary of State Rex Tillerson learned that he was fired via twitter. The firing announcement was tweeted from the @realDonaldTrump account. The @POTUS account set up by the Obama administration, which during the Trump administration has consisted mostly of retweets from @realDonaldTrump, was silent on the firing. This is an example of why there is general agreement that when someone talks about "President Trump's tweets", they are referring to those from the @realDonaldTrump account.

Wikimedia Commons has two screengrabs of @realDonaldTrump tweets archived there, and some content sourced to Congressperson twitter accounts. Since there hadn't been any discussion specifically about the copyright status of POTUS tweet screengrabs I asked for clarification there. They agreed with my take that a screengrab of a basic POTUS tweet showing text and a profile picture is PD-USGOV, but that a screengrab showing anything more within it has to have those interior items separately evaluated, and blurred out if they are not PD.

Thanks! Dennis the Peasant (talk) 02:51, 10 October 2020 (UTC)

Unfortunately, given the above notes on Copyright status and the guidance at WS:WWI on documentary sources, they do appear to meet the criteria for being here. However, per the precedent exclusions given at WS:WWI, they must be complete and not fragmentary. I would expect them to be verifiable on Wiki. I say "unfortunately", because I'm not convinced that they will have a long-term value here at enWS. They will be archived in other places, because of what they are. I would anticipate that they would become a vandalism target, as are the letters from the Zodiac killer. Beeswaxcandle (talk) 04:41, 10 October 2020 (UTC)
  • The copyright status is a separate issue (and, NB, note that retweets are not PD-USGov under any circumstance!); my main concern is that these do not fit the purpose of Wikisource. There are lots of services that archive tweets and there is very little we can do to add value to them. They are some kind of bastard hybrid between off-the-cuff verbal communication and extremely informal and short written communication. They are not published in any sense that is relevant to our inclusion criteria. With a book or news article-style publication, subject to editorial control, sure: we could figure out the copyright situation and, if compatible, host. But indiscriminate inclusion of all, or a random excerpt of some, of an account's tweets makes absolutely zero sense. If any tweets should be permitted it would certainly be the tweets from a sitting President of the US, but I just don't see it. This is not what Wikisource is for. --Xover (talk) 07:03, 10 October 2020 (UTC)
  • I agree with the above, this is not a good use of Wikisource. On the other hand, we could definitely host content along the lines of The Tweets of President Donald J Trump (2020) provided that the work as a whole is freely licensed or PD. —Beleg Tâl (talk) 12:19, 10 October 2020 (UTC)
    Indeed. --Xover (talk) 12:41, 10 October 2020 (UTC)
  •   Comment I don't see it within our scope. The overarching conversations and the retweets are not within scope, and by their nature they are neverending conversations. Trump's tweets are excerpts of the conversations. Aside I don't see that it is within the indication of our scope of published works. — billinghurst sDrewth 13:24, 10 October 2020 (UTC)
  • comment there are other sites doing this work, http://trumptwitterarchive.com/archive and can be a citation for quotes. this community tends to concentrate on excavating reference texts not available elsewhere. Slowking4Rama's revenge 15:19, 10 October 2020 (UTC)

Thanks for the helpful, albeit discouraging comments!

As I noted above, the Presidential and Federal Records Act Amendments of 2014 revised the definition of official "records" to include all recorded information, regardless of form or characteristics. To summarize the feedback, it seems that a subset of Presidential official records, including but presumably not limited to posts on Twitter, possess characteristics which put them outside the scope of Wikisource.

To help out future Wikisourcians thinking about archiving Presidential and Federal Records, may I ask for clarity on what exactly are the forbidden characteristics? Length, formality, interactivity, possible vandals, lack of publication elsewhere, and the existence of other archives have all been mentioned, what are the red lines in these categories?

Thinking about other social media platforms commonly used by Congresspeople and Presidents, are reddit or Facebook posts (which typically exceed 280 characters but can involve interactivity) also outside of the scope of Wikisource? How about longer posts, without any interactivity, on a digital-only platform like Medium?

I'll toss out two test cases of digital Presidential communications which may help structure the discussion. Here is the URL to an archived Medium post by Obama: medium.com/obama-white-house/to-my-fellow-americans-649af4c5fc49; it is lengthy, contains images but no hyperlinks, and is not part of any conversation. To me it reads like an ordinary press release, or a transcript of a speech. The post's embedded images would certainly be OK to upload on Commons. Does archiving the text of this post fall within the scope of Wikisource, alongside the existing material at Author:Barack_Hussein_Obama?

For a second test case let's consider a tweet, from Obama to separate out the issues of potential vandals and alternative archives. With Obama's Twitter communications, the administration complied with its archival responsibilities in two ways. The most public archive is the @POTUS44 account which had all Obama @POTUS tweets migrated to it. Currently this account is easy to access and use, but of course there is nothing preventing Twitter from going out of business, deciding to delete the account, putting the information behind a paywall, etc.

The administration also made available for download a zipped archive with the text of the tweets in CVS and JSON formats, and included an html file to allow searching and reading within a browser. While this form of archiving has a lot going for it, it requires multiple actions and software to get the browser access going, and while this functionality worked well on my desktop, I couldn't get it to work on my Android phone. Additionally, the raw date is incomplete (ending on 11/16/2016), and in minor aspects often wrong (many tweets are mislabeled as retweets, probably due to the migration activity).

It seems to me that the public would benefit (admittedly, only a tiny bit) by having access to an archive of Obama's tweets in an easily readable and searchable format outside of Twitter. These would have to be reformatted from CVS or JSON to be readable, and the t.co redirection links would need to be replaced with URLs to their destination. These tasks are straightforward to automate, and here's a sample reformatted tweet:

This sample POTUS tweet seems pretty anodyne to me, but it seems the community feels strongly that archiving tweets like it does not fall within the scope of Wikisource. OK, but why? The brevity? Thanks again! Dennis the Peasant (talk) 20:10, 11 October 2020 (UTC)

i tend to be more tolerant of scope than most, but i have several questions: who is going to transcribe and maintain this? who is going to build the index? how are you going to find anything? where is the pdf text? did you upload the text to internet archive? how are you going to deal with deleted tweets? you realize how large the federal government document backlog is? you realize this community gets grumpy when people dump non-scan backed text and leave? you realize that archiving social media is a challenge for the library of congress and national archives? Slowking4Rama's revenge 03:48, 12 October 2020 (UTC)
^ this is exactly how I feel as well. —Beleg Tâl (talk) 17:06, 12 October 2020 (UTC)
sorry to rain on your bright idea. the problem being, there are a lot of bright people here with ideas; the sticking point is always the implementation plan, and the team recruitment. (it is a wikimedia pain point) Slowking4Rama's revenge 01:40, 13 October 2020 (UTC)
I appreciate the questions, and apologize for the delay in answering them. Deletion is a thorny issue, so allow me to pivot from suggesting we archive President Trump's tweets [2016 - present] to suggesting we archive President Obama's tweets [2015-17], a simpler project. We can move on the Trump case later, if warranted. So with the proposal on the table now being to archive Obama's @POTUS tweets, on to your questions:
Who is going to transcribe and maintain this? I am volunteering to transcribe them, and since this is a pretty small project I wouldn't need collaborators although I would welcome them. I'm also happy to work on their maintenance, although since Obama's are static I do not know what is needed beyond keeping the pages on my watch list to catch vandalism.
you realize that archiving social media is a challenge for the library of congress and national archives? Yes I am aware of the challenge, and the very rapid pace of software development further increases the difficulty. With Obama's tweets, the National Archives and Records Administration (NARA) has taken action - they maintain the @POTUS44 archival Twitter account - but I don't know of any other archiving actions by them.
where is the pdf text? did you upload the text to internet archive? Currently there is no official pdf text archive of Obama tweets to scan and upload, but one can make links at each tweet here to the corresponding tweet at the official NARA online archive. (I did this in the above sample Obama entry, it's the first link.) So each tweet archived here would be readily verifiable, in perpetuity since the NARA is maintaining the archives.
you realize this community gets grumpy when people dump non-scan backed text and leave? Understandable, but in this case there are no backing documents existing on paper or as pdf. So what is the verification process, or does one need to be decided upon?
Commons has a "trust but verify" copyright verification process - if an uploader claims that some content is CC licensed at Youtube, it is posted but with an automatic notice that an admin will verify this claim is true at some point. Maybe something similar could be done here, with an admin or proofreader clicking on each verification link after initial posting, and then noting on the page's notes that the transcription checks out.
who is going to build the index? I volunteer to also build an index, perhaps one modeled on the index for Obama's Presidential Weekly Addresses would work. I envision 20 pages (one for each month), with subsections for each day.
how are you going to find anything? I anticipate three major ways:
  • People who are interested in a subject would use keyword searching
  • People interested in a specific time period would navigate using the index and the pages' TOC
  • People interested in a specific tweet could find it either through searching (if they know some specific wording) or via timestamp anchors (if they know the date and time of the tweet).
The timestamps also offer an easily sharable entry to the archive, as the URL will indicate the month, day and time of the tweet. So if one shared a Wikisource URL containing "/wiki/President_Obama_Tweets_2015-10#01-02:27PM", it is clear that the link refers to an Obama tweet from 10/1/2015, tweeted at 2:27PM EST.
you realize how large the federal government document backlog is? Yes, but it is natural to update which documents are archived. Trump discontinued the time-honored Presidential Weekly Address tradition entirely in June 2018 in favor of other forms of communication, the most important of which (for him) is tweeting. Since Presidential Weekly Addresses are no longer given, it makes sense to think about archiving the communications which displaced them.
And while POTUS Twitter communications are sometimes no more than barbaric yawps, there have been others which have had great historical significance. As a category, it seems to me that they deserve to be archived here. Dennis the Peasant (talk) 06:25, 24 October 2020 (UTC)

This discussion is about to get archived without any comments on my last post. @Slowking4, @Beleg Tâl: did you have any thoughts about my responses to your questions? @Billinghurst: can you interpret for me how this discussion comes down on the question of whether I should go ahead with the archiving of Obama tweets? Thanks! Dennis the Peasant (talk) 05:01, 17 November 2020 (UTC)

I don't see an consensus of opinion that a tweet or a collection of tweets are in scope, nor that we should expand our scope to have them included. My personal opinion is unchanged. — billinghurst sDrewth 05:43, 17 November 2020 (UTC)
I agree with Billinghurst on this one. --Xover (talk) 06:15, 17 November 2020 (UTC)

Scottish Chapbooks with blurred pages.Edit

Just for information, I have noticed some of the Scottish Chapbooks are displaying blurred pages.

I have found that by clicking the link to the original source page you can download the page and it is not blurred.

I'm not sure of the cause of this issue, but thought I'd flag it here in case anyone else is having difficulty with them.

For example a recent blurred page I encountered is Index:Young Gregor's ghost in three parts (NLS104184433).pdf with the clear page found at [2]

Sp1nd01 (talk) 12:25, 27 October 2020 (UTC)

This is caused by over-compression by the LuraDocument compressor used by NLS, which looks like it's failed to separate text and image on the first page. Either the PDF can be regenerated at our end from the source images, which needs a bit of faffing about, or maybe NLS can just re-run the derivation step. I have no idea if their workflow can do that easily. @LilacRoses: any idea? Inductiveloadtalk/contribs 12:34, 27 October 2020 (UTC)
Hi @Inductiveload:, apologies for the long wait. I have looked into this issue. Since their initial upload to Wikimedia, I understand our LuraTech compressor settings and version have been updated and therefore we would be able to redo the PDFs, but it would need to be done as more of a batch rather than individual items. With that in mind, I wondered if it would be possible to replace the files on Wikimedia Commons which connect to the Wikisource items without too much of an issue? It's an area I'm not too familiar with so any guidance on this would be much appreciated. The main concern I have is that I would want the new file to link to the same item on Wikisource, so for the example used above, if we were to redo the file https://upload.wikimedia.org/wikipedia/commons/6/66/Young_Gregor%27s_ghost_in_three_parts_%28NLS104184433%29.pdf, the new file would need to still link to https://en.wikisource.org/wiki/Index:Young_Gregor%27s_ghost_in_three_parts_(NLS104184433).pdf as we use these links in order to find and track the items, and it will really complicate things on our end if this is changed.
If you are able to please advise on the best way to overwrite the files, that would be great! Thanks in advance, LilacRoses (talk) 12:47, 16 November 2020 (UTC)
@LilacRoses: hi! It's easy to add new versions of files at Commons:
  • Go to the commons page commons:File:Young Gregor's ghost in three parts (NLS104184433).pdf
  • Find the "Upload a new version of this file" link which is just below the "history" table
  • Follow that link and select a new file to upload from your computer
  • Enter a description (e.g. "regenerated PDF with clearer compressor settings")
  • Click OK and let it upload
  • As long as the file has the same pages as the original, the Wikisource index will not need any changes and the page images will update automatically. Sometimes there is a short delay while the caches regenerate at Commons.
  • The whole process can be automated - Pywikibot and friends will happily do this as well (I'm not sure how NLS has been uploading files).
Tl;dr update at Commons, everything else "just works". Inductiveloadtalk/contribs 13:37, 16 November 2020 (UTC)
@Inductiveload: thank you for the help! It will be scheduled to be done later this month. Pattypan was used to upload the files to Commons. As we have quite a few blurry items, do you have any guidance on how the process could be automated? Thanks in advance! LilacRoses (talk) 14:41, 16 November 2020 (UTC)
A script to upload files would be pretty simple using Pywikibot. It really depends what data you have. As long as you know the filename at commons, it's very easy. Otherwise if you only have the NLS ID, you might need to first search for the file ending in (NLS xxxxxxxx).pdf. It really depends how many files there are to replace: if there are 10, it'll be quicker to do it manually, if there are 10k, then you need a script (or you need to buy the interns more coffee and pizza). Inductiveloadtalk/contribs 14:50, 16 November 2020 (UTC)

Self-published lecturesEdit

What is our attitude to works like On Marquez's One Hundred Years of Solitude, originally self-published at [3]? I personally can imagine inclusion of such works, but Wikisource:What Wikisource includes states that works hosted at Wikisource "… must have been published in a medium that includes peer review or editorial controls; this excludes self-publication." Is it possible to alter the particular criterion to include such works somehow (e.g. making an exception to selfpublished lectures of well-known authors), or should this work rather go? --Jan Kameníček (talk) 14:38, 29 October 2020 (UTC)

  • Whatever the criteria, this work (and the others listed on his author page) should definitely be included. If a change is necessary (which it really is all for all guidelines), it should occur. TE(æ)A,ea. (talk) 16:23, 29 October 2020 (UTC).

OK, so to keep the work here as well as enable inclusion of others from Author:Ian Courtenay Johnston#Lectures (in fact I am considering adding some of them here), I suggest to replace the the part of a sentence "…this excludes self-publication" by "This usually excludes self-publication; rare exceptions can be considered provided that the writer of the self-published analytical work is a renowned academic author". --Jan Kameníček (talk) 08:25, 30 October 2020 (UTC)

It was accepted at the time, it is in scope. Anything can be discussed within scope, as there are edge cases. I don't think that we need any change in the policy or the wording, just bring forward items that are those edge cases. We already have many self-published old works, the rule is primarily aimed at conflict of interest and self-interest additions. — billinghurst sDrewth 09:02, 30 October 2020 (UTC)
How can something about which our rules say that it cannot be included to WS be in scope?
Johnston’s self-published lectures are definitely not an edge case at the moment. Currently the rule clearly and explicitely says that such works are excluded and forbids adding them, which is a pity. Here we can state an opinion that such works can be included, but generally it does not solve anything if we do not write this opinion into the rule’s page. If later some other contributors come to similar cases and they start wondering whether the work can be added to WS, they will most probably (similarly as I did) go to the "What Wikisource includes" page, where they will learn that the work cannot be included. So, if the work can be included, the rule should reflect it.
@"the rule is primarily aimed at conflict of interest and self-interest additions": I am not sure if this is only an opinion or a fact. The rule itself does not say it.
One more problem: Let’s say that on the basis of the opinion expressed above I will add the lectures from the list to WS. That would require some amount of work. How can I be sure that later it will not be deleted because of the current rule stating explicitely that such works are excluded? It cannot be required that people should work against our rules. If something can be acceptable, the rules should at least admit that it can be acceptable. Edge cases will always happen, but this is not an edge case, it is clearly behind the current fence.
Suggested addition makes adding such works possible and makes it known to everybody searching for such information in our rules. --Jan Kameníček (talk) 09:57, 30 October 2020 (UTC)
The word "renowned" makes this suggested change untenable for me. Who is to define renowned in any particular case? How broadly should the person be known as an academic author? Outside their institute of higher learning? Outside a geographical region? Outside their discipline? I'm also not convinced that a publication of a lecture given in the context of a course of learning is covered by a clause titled Analytical and artistic works. The lecture in question is neither.

In terms of self-publishing, what is the reason the work was self-published? To prevent censorship or suppression? Or self-aggrandisement? What's the difference between the vanity presses of the late 19th and early 20th centuries and the blog-posts of today? What's the distinction that allowed us to take Tom Lehrer's lyrics from his website and put them up, but not a piece of fan-fiction? I don't have definitive answers to these philosophic questions, other than to note that the Consensus section of WS:WWI allows us to agree to include or not include particular works by discussion here. A policy is meant to be read and applied in toto. Beeswaxcandle (talk) 18:04, 30 October 2020 (UTC)

Amen. If someone has a work that they wish considered by enWS then point to it, and ask about it. In the range of the works that we reproduce they are definitely edge cases. And of course that part of the rule is aimed at modern self-addition, how else will we exclude someone from publishing their poetry, their writings etc. ? While irregular now, it used to be consistent issue. — billinghurst sDrewth 13:15, 31 October 2020 (UTC)
OK, so I am leaving my attempt for more general pardon of such works.
Despite that, it seems that nobody raised any objections against adding Johnston’s lectures as such. Unless some objections appear, I will probably add some of them to WS. --Jan Kameníček (talk) 22:24, 31 October 2020 (UTC)
@Jan.Kamenicek: Just for the record, I think this should ideally have led to an amendment to the policy to make the scope for discussing edge cases clear and explicit, and for pretty much the reasons you articulated above. I think Beeswaxcandle makes goods points that should be addressed, but I see that as a matter of "how best to" and not "whether to". But as I don't have the spare cycles to participate meaningfully in such an effort, I'll limit myself to just expressing general support for the idea and leave it at that.
PS. Please link to this discussion from the works' talk page or similar (even having it in an edit summary helps), so any future deletion discussion will have easy reference to it. --Xover (talk) 07:37, 2 November 2020 (UTC)
"How can something about which our rules say that it cannot be included to WS be in scope?" = IAR. i find all the rules lawyering, and amendment, and strict constructionism, to be a waste of time. go ahead an rewrite rules if it makes you feel better, but it will not stop a deletion, if a rouge admin wants to assert "out of scope" as we have seen on other projects (like commons) Slowking4Rama's revenge 01:50, 3 November 2020 (UTC)
  This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Jan Kameníček (talk) 22:25, 25 November 2020 (UTC)

Tech News: 2020-45Edit

16:09, 2 November 2020 (UTC)

Dotted lineEdit

Hi. Is there any template that simply can draw a dotted line? --Yousef (talk) 07:59, 3 November 2020 (UTC)

Try {{***}} (change the character as needed) for one on a separate line. Use {{...}} for an in-line line. Beeswaxcandle (talk) 08:07, 3 November 2020 (UTC)

Country DiaryEdit

I've started to copy over copyright-expired Country Diary entries by Thomas Coward from The Guardian - there are over 230 of them to do. For example:

How could their headers be improved? How about linking between them ("next" and "previous")? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:06, 4 November 2020 (UTC)

@Pigsonthewing: I suggest:
Inductiveloadtalk/contribs 12:16, 4 November 2020 (UTC)
@Inductiveload: Thank you. I went for the former for now, as we currently have no other content from the relevant issues. I would still like to improve what is in the headers (currently "The Guardian by Thomas Coward"). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:03, 4 November 2020 (UTC)
@Pigsonthewing: Sorry, that wasn't very clear! The two points were orthogonal. I meant "alternately" to using "next/previous" fields. It's kind of moot since the chances of adding more content from the exact issues is, shall we say, slim.
Other daily newspaper headers use something like [[../../../../]], [[../|6th July, 1920]] for the header title field, and then "Country Diary" for the section field. Inductiveloadtalk/contribs 14:11, 4 November 2020 (UTC)

Wiki of functions naming contest - Round 2Edit

22:11, 5 November 2020 (UTC)

Template helpEdit

I looked in Category:Content templates and Category:Article templates (which seem like they should be merged, a separate point) and could not find a template for the top just basically saying "This article needs someone to come fix the endnotes/footnotes/whateveryoucallthem". If anyone knows it and wants to add it to my latest additions, much thanks. Wikisorce (talk) 04:53, 6 November 2020 (UTC)

@Wikisorce: Are you looking for Help:Footnotes and endnotes. Use <ref></ref>billinghurst sDrewth 07:19, 6 November 2020 (UTC)
i think they want a w:Template:Unreferenced, w:Template:More_citations_needed. but we do not really tag spam here like english wikipedia; we tend to link to pages to fix and people fix them. and we do not really merge categories either.
they seem to be scrapping and text dumping google books, without uploading to commons from IA. maybe some coaching about the scan backed process would be helpful. Slowking4Rama's revenge 23:36, 8 November 2020 (UTC)

@Slowking4: Thanks it must have been late at night I missed that componentry. @Wikisorce: We typically don't just scrape text, it is just a rubbish way to do book transcriptions. It was tried in the early days, and it was discarded as that crap just stayed as crap, and was too hard to fix. We rely on image files at Commons as a means to get reputable transcriptions, that can be further validated. — billinghurst sDrewth 23:56, 9 November 2020 (UTC)

three of the five works by Lysander Spooner, are at IA, and i linked to them at the page header. i leave the upload at commons, to the student. our new editor appears not to have returned. Slowking4Rama's revenge 22:10, 11 November 2020 (UTC)

Need help for updating proofreadpage update on Telugu WikisourceEdit

I initiated an update for Proofreadpage to try pagelist widget. In the process the index page edits are not showing the already populated entries. As part of this process I have deleted "MediaWiki:Proofreadpage index attributes". While trying to delete "MediaWiki:Proofreadpage_js_attributes", I created a blank page. Several years back, I think user:billinghurst helped setup proofreadpage on Telugu wikisource. This site uses MediaWiki:Proofreadpage_index instead of MediaWiki:Proofreadpage_index_template as mentioned on Proofread extension page. Pagelist widget error on Telugu wikisource bug can be seen for related info. Request user:billinghurst and others to help make the index page editing work again.--Arjunaraoc (talk) 06:35, 8 November 2020 (UTC)

@Arjunaraoc: te:MediaWiki:Proofreadpage_js_attributes was created by you today, I have deleted, as you should be using te:MediaWiki:Proofreadpage index data config for your config. — billinghurst sDrewth 06:49, 8 November 2020 (UTC)
@Arjunaraoc: have you stepped through mul:Wikisource:ProofreadPage/Improve index pages it is what I need to do as I so rarely play in that space. — billinghurst sDrewth 06:54, 8 November 2020 (UTC)
And if you are using the template fill gadget, the customisation is at MediaWiki:Gadget-Fill Index.js. Re the config change for Pagelist, you can see user:Inductiveload's edit for us at special:diff/10443824, and I am not really around the rest of that configuration change. — billinghurst sDrewth 07:02, 8 November 2020 (UTC)
Thanks billinghurst for your quick feedback and pointers. I got Proofreadpage and pagelist working. --Arjunaraoc (talk) 10:25, 8 November 2020 (UTC)

Tech News: 2020-46Edit

15:50, 9 November 2020 (UTC)

Using Wikisource tech on Wikibooks and elsewhere: MediaWiki extensionEdit

Hi all, I recently asked on en.Wikibooks whether the project could support ebook development better by bringing across the layout features that Wikisource employs, such as layout templates, navigation between chapters, ebook export tools. Wikibooks admins however felt that this would be difficult to deploy because of decisions they had made about their own set up. In the discussion, it was suggested that the better way to do this might be to develop a MediaWiki extension that contained the core features and made it easier to deploy. Since the people here know the tech, I thought I would ask if this seems sensible. Is a MediaWiki extension the right way to share Wikisource's layout tech? is this something this project would help with? JimKillock (talk) 04:12, 12 November 2020 (UTC)

@JimKillock: Our toggled layouts are independent of our text, and grabbing layouts is just a matter of looking what is in our mediawiki:common.js; our texts layer then just flow.

With regard to WSexport, I know that it is an independent tool that is having some rewrite. If you think that it should be plying its trade to more wikis, then it would be a matter of a phabricator ticket to get them to make those changes. — billinghurst sDrewth 08:47, 12 November 2020 (UTC)

Thanks, I have a basic knowledge of html and css, but Wikimedia / MediaWiki's technical workings are new to me. These are the items I am looking to port across
  • optional narrow single column formats;
  • optional serif fonts for body text
  • Differing styles for headings and so on (not always underlines)
  • easy navigation bars (last page, next page)
  • optional export links to common ebook formats
  • Left and right marginalia
So two questionsn regarding those:
  • Are all of those fairly simple CSS / JS fixes?
  • Are the engines for each of these just a Template to copy across otherwise?
And on WSExport,
  • how would I make / file a phabricator ticket? JimKillock (talk) 09:18, 12 November 2020 (UTC)
@JimKillock: there are a few parts to this.
At enWS, the dynamic layouts are provided by Javascript which forces certain content to have the switchable styles. This code is currently a mess and split among various JS and CSS files. It is planned to transition it to a proper gadget. Not all of it will be needed for Wikibooks as there is special code for the page numbers/links that's mixed up with it. It will probably be easier for Wikibooks to make their own gadget for this, as it just needs to add some CSS and then set a relevant on the main content div.
Left and right marginalia are tricky and even enWS has not hit on a perfect solution yet.
Navigation bars are part of out templates (like {{header}}) and Wikibooks can do something similar. It's just a normal wikilink. {{header}} is probably not suitable for a direct port to Wikibooks and it's very custom-built for Wikisource, but you could use some concepts.
The WSexport tool is provided by an external service that runs on the Toolforge infrastructure. You can file bugs and issues at https://phabricator.wikimedia.org/tag/wikisource_export/. WSexport does not use the dynamic layout stuff, but it does apply special CSS from Mediawiki:Epub.css.
There is a very small gadget at enWS used to add the links for WSexport, but the rest of the logic is on the Toolforge server. Inductiveloadtalk/contribs 12:32, 12 November 2020 (UTC)
It's been pointed out to me that at first stage, the layout options are more for the editors to choose than users. The idea being that certain book formats could be chosen as a fixed decision by the editor / creator and applied by a simple template. Does that make sense? If so, I will start with that approach as it is quicker, easier and much more within my ability to action without asking for help. JimKillock (talk) 11:25, 15 November 2020 (UTC)
Ji @Inductiveload: could you point me to the relevant CSS files for the (narrow column, etc) layouts? That way I should be able to trial appying these via [[[mw:Help:TemplateStyles]]. Thank you! JimKillock (talk) 21:33, 17 November 2020 (UTC)
@JimKillock:: it's not quite as simple (yet) as a straight CSS file, but the styles (such as they are) are found at Mediawiki:PageNumbers.js in the self.ws_layouts object. Inductiveloadtalk/contribs 12:40, 18 November 2020 (UTC)
@Inductiveload: thank you, that is quite a small amount of CSS per layout, just a few lines it seems. I will take a look later. JimKillock (talk) 16:01, 18 November 2020 (UTC)
@Inductiveload: Ok, I got as far as creating a template to call the CSS, and adding a CSS file at wikibooks:la:Formula:NarrowColumn/Styles.css – however something is wrong that the css parser is not picking up: I can only save the file by commenting the whole thing out. If you can spot why I would be very grateful! And sorry for asking about something trivial JimKillock (talk) 21:20, 18 November 2020 (UTC)
Hi @JimKillock:, I don’t understand the tech, but I want to be able to copy books to WB where they could be annotated. I had help from Quite Unusual at Wikibooks to produce this copy of Economic Sophisms and he transferred(?) some templates over but there seems to be little work done before or since. Could your idea cover copying books from WS to WB? Please excuse by ignorance, I am primarily a proofreader here. Cheers Zoeannl (talk) 00:57, 18 November 2020 (UTC)
The changes I am after wouldn't automate copying across, but they would help with the book layouts once copied, in particular if you wanted your book to appear with a narrow centre column, and to export to ebook format. JimKillock (talk) 08:35, 18 November 2020 (UTC)
@Zoeannl: Looking at your book, it is the same missing templates that I am playing with on Wikibooks and have copied to la.wikibooks, eg x-larger and x-smaller etc. These are very easy to move, you just copy and paste them. The templates are at /wiki/Template:TemplateName eg Template:Larger for {{larger}} - you could easily copy these over yourself. If I implement the CSS for narrow columns, that can be applied and then your presentation issues are more or less solved. And ebook export at Wikisource is next on my list. JimKillock (talk) 16:01, 18 November 2020 (UTC)
@Zoeannl: I added some of the missing templates, and applied the work in progress version of the narrow column template, so you can see how east this is. JimKillock (talk) 18:21, 22 November 2020 (UTC)
@Zoeannl: I have done some more on the book, and it basically all seems to be working now. Can you let me know if anything more needs doing? JimKillock (talk) 20:16, 30 November 2020 (UTC)

Update: someway thereEdit

Hi all, I am someway towards migrating the relevant Wikisource features to Wikibooks, starting with Latin Wikibooks. I have two templates, to open and close the divs which make the column, and a style sheet attached to the first.

Questions:

  1. The styles don't seem to be working out the serif fonts yet. This seems to be due to the way the MediaWiki parser is working, it seems to be over-riding the font values. Any idea how I fix this?
  2. is there a better way to open and close the wrapper divs than using two separate templates, one to open them, and the other to close them?
  3. I have not understood the Wikisource template divs, I think, but just pushed the three across as per the stylesheet, you may have advice on what I should be doing here.

Thanks for any help. JimKillock (talk) 15:43, 21 November 2020 (UTC)

@Inductiveload: @Billinghurst: I am hoping one of you may be able to help with these three requests :) JimKillock (talk) 18:24, 22 November 2020 (UTC)
I am not really a complex css person, I am not help to you. — billinghurst sDrewth 01:25, 26 November 2020 (UTC)
Thanks, this is resolved. More stupid errors my side.
I am still unhappy that I am using two templates, one {{NarrowColumn}} to open the divs, the other {{NarrowColumnEnd}} to close them, with the page content between them. This feels like a great way for users to break things. Is there an alternative way to do this? JimKillock (talk) 18:26, 26 November 2020 (UTC)
OK, I think I have resolved this now. JimKillock (talk) 20:18, 26 November 2020 (UTC)

WSExport and Phabricator ticketEdit

Hi billinghurst Ealier you remarked that If you think that it should be plying its trade to more wikis, then it would be a matter of a phabricator ticket to get them to make those changes — how do I go about that? This is hopefully the last thing I need to do to complete this now, apologies for the many queries. JimKillock (talk) 21:20, 26 November 2020 (UTC)

This is resolved! Thank you for the information earlier. JimKillock (talk) 21:24, 26 November 2020 (UTC)

Is this document worthy of uploading to Wikisource?Edit

I've translated this scientific article into Russian, and I thought that maybe the English version could be uploaded to English Wikisource and my translation to Russian Wikisource. The original text is under the Creative Commons license. I used a couple of images from the article in Wikipedia, and I'm thinking of uploading the translation and providing a link from the Russian Wikipedia article directly to the translation in Russian Wikisource. Is this text worthy of inclusion into Wikisource? --CopperKettle (talk) 15:55, 12 November 2020 (UTC)

I think this article is within our scope and imo it would be great if you added it here.
However, I would like to point out a different problem I have noticed. You say that the article uses some images "from Wikipedia". These images probably come from the article Cerebral folate deficiency. The images in this article do not come from Wikipedia, if you click them, you find out that they are from Commons, a Wikipedia’s sister project, where they were uploaded under specific licenses saying which conditions have to be followed if somebody wants to use the pictures. For example the licence of Commons:File:Folic acid metabolism and 5-MTHF transport across the choroid plexus epithelium in the brain.png, says that among others "You must give appropriate credit" and "provide a link to the license", which was not done in the linked article. However, it is probably not a big problem, as the authors of the pictures are Sarah Mafi and Pierre-Antoine Faye, who are also co-authors of the article. Nevertheless, when a picture from Commons (or "from Wikipedia") is reused, generally the license conditions should be fulfilled. --Jan Kameníček (talk) 16:14, 12 November 2020 (UTC)
Thank you! It was me who uploaded the images from the paper by Mafi et al. to the Commons! --CopperKettle (talk) 17:35, 12 November 2020 (UTC)
Ah, I thought that the paper used the images from Commons and it is the other way. Then it is obviously OK, I am sorry for the confusing contribution. --Jan Kameníček (talk) 17:57, 12 November 2020 (UTC)

Another questionEdit

  • Actually, this work (Brain Sciences) raises an interesting question for dealing with works of this type. The issue the above articles comes from (vol. 10, no. 11) has 92 articles, more than many of the early volumes of this journal had in total. The articles themselves would not be difficult (assuming there were enough people able and willing) to add, but the actual issue page (Brain Sciences/Volume 10/Issue 11) would be very full, and probably difficult to search through. I have laid out how I had planned to list all of the articles on earlier sub-pages, but I feel like this method is very weak at a large scale. Any thoughts? TE(æ)A,ea. (talk) 19:09, 12 November 2020 (UTC).

OCR Gadget newsEdit

Hi all. Over at Phabricator there's been some progress made on fixing the OCR Gadget.

This function has been unreliable and for some works wholly inoperable since at least mid-2019. Over the last week or so there have been progress made on getting it fixed, and Inductiveload has updated our local Gadget (MediaWiki:Gadget-ocr.js). The long and short of which is that the   button in the toolbar should now be functional again.

There have been some unavoidable changes to surrounding infrastructure that may affect the details of the OCR text that is generated, but thanks to extremely thorough testing by Jan.Kamenicek these have hopefully been minimised.

Also, since part of the problem was due to corrupted data in the tool's cache, and this data had to be removed, works affected by this that returned OCR results really fast before, will now take significantly longer the first few pages. The first time a page from a given work is requested the tool schedules a background job to run OCR on the whole work and cache the results. Once that is complete any request for OCR returns the text from the cache immediately (fast), but in the mean time all the OCR processing must be done between the time you clock the   button and you getting the result back (slow). For most works this processing should complete within a couple of hours.

Finally, since the tool has been unreliable for a long time now, and there may have been multiple different causes of the failures people experienced, it would be very helpful if there was widespread testing; both on works that you know failed before, and on other works. There is some attention on this tool just now so good chances of getting any problems fixed, which may not be the case later on (it's a big complex codebase so it takes quite a bit of time to get familiar with it).

If you feel comfortable with that you can report problems directly in the Phabricator task linked above, but as Phabricator is a developer-oriented tool you can also simply report them here (or on my talk page if you prefer) and I'll make sure to summarise them there.

@Ineuw, Beleg Tâl: I believe you both at various times indicated interest in this issue, so pinging for "FYI" purposes.

PS. Also, please keep in mind that we have an alternate "Google OCR" gadget that you can turn on in your preferences, and each tool has its own strengths and weaknesses. I have both turned on and use whichever gives me best results on a given page. The Google OCR gadget has been very reliable in my experience, and often gives even better results. --Xover (talk) 08:40, 13 November 2020 (UTC)

Thanks for the reminder. — Ineuw (talk) 13:04, 13 November 2020 (UTC)
Well done!Mpaa (talk) 18:30, 13 November 2020 (UTC)

Systemic issue with page-scans.Edit

Can someone check if this is just me, Index:Ruffhead - The Statutes at Large - vol 2.djvu or is there a quality problem with the scanned pages I've marked (!), it's NOT present on the original scans at IA, so one solution is for someone to re-generate the file. ShakespeareFan00 (talk) 15:47, 14 November 2020 (UTC)

@Hrishikes:,@Xover: Quality issues with Djvu are generally you area of expertise? ShakespeareFan00 (talk) 15:49, 14 November 2020 (UTC)

@ShakespeareFan00: Please describe the issue you're seeing. At a cursory glance, the page images here and at IA are entirely comparable. --Xover (talk) 16:53, 14 November 2020 (UTC)
The issue I am seeing is that on the pages I've marked, the side-notes are very blurred to the point of illegibility, The ones on IA when viewed directly are not. Whilst some compression loss is expected in djvu, I was not expecting illegible text in places.ShakespeareFan00 (talk) 19:15, 14 November 2020 (UTC)
@ShakespeareFan00: Ah, I see. This is indeed a problem with the (IA-generated) DjVu file. IA have used settings for background separation and compression that produces really poor results on this text. It's a typical "pathological case": the scan images are slightly too low-resolution for the text to begin with, compounded by the sidenotes being in very small type, and having very low contrast with the background, and then IA's DjVu software (LizardTech's encoder AIUI) has tried to separate the foreground (the text) from the background (the page paper)—which is an imperfect process—and then compress both separately with a lossy compression akin to JPEG in these respects. The end result is that they have produced a ~57MB DjVu from ~650MB of scan images (that's a better than 10:1 compression ratio on already compressed data!). If file size is your main concern then that's an incredibly good result, but for image quality and legibility not so much.
I can try to regenerate this from the source scans, but while that will almost certainly give better image quality (at the expense of file size), I can't guarantee that the OCR quality will be the same. The scan itself is really suboptimal on several scales that frustrate optimal OCR.
@Inductiveload: In case you don't have any to hand, here's a perfect example of the pathological case where IA's encoding fails. Zoom in on the sidenotes on this page (DjVu index 765, logical page 715). It looks like they've been written with a partially-dry magic marker, rather then printed or drawn with a pen. You can see the source at IA, showing that in addition to everything else the scan is slightly out of focus. This is why I'm deliberately not optimising for file size in my tools. --Xover (talk) 08:36, 15 November 2020 (UTC)
@Xover: yep, this is an example of a Mixed-Raster Content (MRC) encoder failing to segment the text correctly - it's basically treating that text like the background instead of as foreground. It's a shame the IA sets their (3rd party proprietary) encoder so aggressively, a "mere" 8:1 compression over JP2 would probably be a lot better. But I don't pay for their millions of hard drives, so I can't complain too much! They generally do a better job of compressing than the Big G. As you say, file size isn't critically important to us, but it does sometimes cause some pain when trying to sling files around.
If it's really necessary for the file to be regenerated and the new OCR isn't up to scratch, and the IA OCR is better and the OCR and Google OCR gadgets both do not work well, you could consider using the IA's own OCR and patching it over the new DjVu. Inductiveloadtalk/contribs 13:36, 15 November 2020 (UTC)
@ShakespeareFan00: I've generated and uploaded a new version without the aggressive compression (as you can see it's over 10x larger file size). The images should be comparable to what you see on IA now, but keep in mind you'll see a lot of cached thumbnails from the old version for a while. --Xover (talk) 14:38, 15 November 2020 (UTC)
@Xover: FYI, a compression at the c44 step (-size parameter) aiming for roughly 250kB image produces "OK" image quality with a legible page 765 (or at least as legible as the original), for a total DjVu size of about 200MB. I only add that parameter myself if it seem I can get away with it and without it it's producing large files compared to the output quality - I think you can save some of the file size by allowing it to throw away JP2 compression noise. In this case, aiming for 100MB file size produced pretty nasty artifacts, but 200MB and up was OK. I don't sweat it too much either way because WMF has lots of hard drives on one hand and the IA has the original JP2s on the other. Not that there's anything wrong with not caring about file size, but there's certainly a spectrum from "compressed to total unreadability" to "multi-GB files slavishly reproducing imperceptible high-frequency image data". Inductiveloadtalk/contribs 21:50, 15 November 2020 (UTC)

Two column layoutsEdit

What solutions are there at Wikisource for two column layouts, such as parallel text editions, with both texts presented, either on the same page, or opposite pages? Obviously, on a wiki, the content needs to be presented left-right in two columns rather than on different Wiki pages, and for ebook export, there is a second challenge that on any given ebook page rendering, the text needs to be roughly the same. As langauges tend to present in different but fairly consistent proportions, the text would need to "break" at certain points.

Is the only practical solution to this to use tables with columns / rable rows? or does Wikisource have other ways around this? JimKillock (talk) 11:33, 15 November 2020 (UTC)

I found the Help:Templates#Column_formatting section but if there is anymore to be said let me know, thank you/ JimKillock (talk) 12:47, 15 November 2020 (UTC)
If the bilingual text is placed on opposite pages, you may try the template {{Bilingual}}, see e. g. A Book of Czech Verse/K. J. Erben or Modern Czech Poetry/Cromwell at the corpse of Charles I.. --Jan Kameníček (talk) 22:29, 15 November 2020 (UTC)
Just forgot to ping @JimKillock:. --Jan Kameníček (talk) 22:32, 15 November 2020 (UTC)

Tech News: 2020-47Edit

15:37, 16 November 2020 (UTC)

First proposal reserved for OCR toolEdit

The Wishlist survey just got opened for proposals, so I wanted to make sure the first one that was received was a request for a new OCR tool. Please comment there to register your thoughts. –MJLTalk 18:05, 16 November 2020 (UTC)

My biggest wish is that the Community Tech team as such gets fixed above all. They allow us to have wishes and vote about them, but they are not able to fulfill them, some of those chosen they even simply ignore (see the fifth one from the last year’s results) and after a year passes they start a new round of our vain hopes. The contributors to Wikimedia projects definitely deserve much better technical support than the annual fuss called Community Wishlist :-( --Jan Kameníček (talk) 20:04, 16 November 2020 (UTC)
@MJL, @Jan.Kamenicek: So far as I know, Community Tech are in the early phases of implementing the new OCR tool from last year's Community Wishlist and there is no need to resubmit that wish (Kaldari, do you happen to have up to date status on this?).
The problem is that they're a small team and tasked with completing all the accepted wishlist items from all the projects, some of which are rather large and complex, and some of which have dependencies on changes to other parts of MediaWiki that are owned by teams that are already up to their necks in other tasks. There are legitimate complaints to be made about the amount of resources allocated to the smaller projects, but those are properly addressed to the WMF leadership and the Board, and not individual developers doing the best they can within the limits they operate under.
But that being said, we can certainly do a lot better on our end, in specifying our wishes better, and coordinating our priorities across the different language Wikisourcen. For example, there were at least three different requests for OCR tools made last year, and it took a lot of discussion to determine whether they were similar enough to be merged. The Wikisource communities should put some effort into coordinating among ourselves before we complain too loudly. --Xover (talk) 21:39, 16 November 2020 (UTC)
I do believe that individual people of the team do their best, but the team as such does not work well, their small number being probably one of the causes. BTW there was only 1 wish from Wiktionary that made it among the five "winners" and after a year of vain waiting for fulfilling the promise they were told that they can just apply again (with zero chance as Wikipedia wishes will probably roll over smaller projects this year).
As for our end: I wrote the wish for OCR tool last year and I think I did my best to specify it as well as I could, consulting it with other people too (including you) and after discussing it with authors of the other two proposals I merged them into this one. I know it is not that much in comparison with the amount of work needed to create the tool, but contributors’ main task is to contribute and tech team’s task is to provide technical support for them. While the content of Wikimedia projects grows enormously every year, the technical backup stays far far behind.
You are right that my sighs here do not help anything, but I do not know of any systematic gathering of contributor’s opinions on tech team’s work by WMF. Not a long time ago I was asked to fulfill some questionnaire and was prepared to tell WMF everything, but it was full of questions about friendliness and harrassment in Wikiprojects, and practically nothing about our satisfaction with technical support. They do not seem interested in it, otherwise they would create some forum for people to speak about these problems. Finally I managed to find some space in the questionnaire to say what I was not asked about (that the Tech Team is drowning in a large amount of work and that phabricator is flooded with unsolved problems) but I am convinced I did not tell them anything they had not known, as everybody knows it. So please forgive me that I just vented my frustration. --Jan Kameníček (talk) 22:36, 16 November 2020 (UTC)
@Jan.Kamenicek: I could write a dissertation on the apparent disconnect between the WMF (as an organisation and at the leadership level) and the actual needs of the projects, most prominently exhibited in the lack of priority given to technical areas and especially for the smaller projects. But the thing is, I doubt you'd find much disagreement on that among the engineers working for the WMF; they just can't be too vocal about that in public given that they are employees (I know they advocate for more resources for technical areas internally though).
And please don't take my criticism of the community (failing to) hold up their end as criticism about your proposal in the 2020 Wishlist. What I meant was that, for example, right now the community should be discussing and refining our wishes for the 2022 Wishlist. We should be actively gathering input from all the different language Wikisourcen on what their priorities are, and then submit a unified and well specified package of wishes that represent the needs of all the Wikisourcen (who have much more in common than with the non-Wikisourcen). And we should do some active advocacy for our most critical wishes ahead of time. That is, we should be acting as a common community of interest, and not a bunch of uncoordinated Internet people. --Xover (talk) 07:51, 17 November 2020 (UTC)
expecting the volunteers to professionally curate their needs and lobby for them internally is a little "unrealistic". we had some progress stemming from the wikisource conference, but it required resources from WMF. WMF has historically not done UX well, leading to a dysfunctional "partnership". given the half-done visual editor roll out on wikisource, i do not expect much from wishlist, but by all means continue to try, it will keep the channel open. Slowking4Rama's revenge 13:23, 18 November 2020 (UTC)
@MJL, @Jan.Kamenicek, @Xover, @Slowking4: The Community Tech team is definitely still working on a new OCR tool for Wikisource. They are behind schedule though because the Watchlist Expiry project that they just finished took most of the year to complete (as it was more complicated than expected and the team was slowed down a bit by the pandemic, especially for folks with kids). Some of the preliminary investigation work for the new OCR tool has already been completed. (See T244100 and related subtasks.) They are also currently working on Ebook Export Improvement, which they've already made substantial progress on. The team is not going to ignore any of the top wishes. It may just take a while longer to address them all. Apologies for the delays! Kaldari (talk) 02:38, 19 November 2020 (UTC)
@Kaldari: Thanks very much for some good news. When I wrote that one of the top wishes was going to be ignored, I wrote about this edit. Fortunately it seems that the decision was revoked and the status of the wish has been changed for "pending" again. --Jan Kameníček (talk) 11:08, 19 November 2020 (UTC)

Dynamic layouts in other than Main: space...Edit

  1. Well someone enabled it. However, there are a number of templates (notably {{left sidenote}} and {{right sidenote}} that define specific classes and rules for ns104 (Page: namespace) Does someone have a list of these so that they or Mediawiki:Pagenumber.js can be updated accordingly? ( I also strong suspicions that some templates also make sweeping assumptions as to the initial page setup they expect and so aren't necessarily as compatible with dynamic layout as would be desirable.
  2. Another consideration is that the current dynamic layouts parameter sets in Mediawiki:Pagenumber.js do not yet have the additional rules to make it feasible to get an "approximation" of the relevant layouts in Page: namespace ( complicated by the side to side display of the page scans.). Mediawiki output in non Main namespace, and crucially that is not coming via a page tag, also lacks the column and region container elements that some of the layouts rely on.
  3. I am also needing some assistance in figuring out how to create my own 'layout' for test purposes, because the approach given in Help:Layout didn't actually work. Despite defining a custom layout per the documentation, it failed to show up as an option when looking through the available ones on a page.
  4. Also, currently whilst dynamic layouts seem to be active in Page: namespace, they don't seem to be fully active when editing or previewing in that namespace, which means that it's not as easy to check if a desired transcription will render correctly. Perhaps this is something that's being worked on?

ShakespeareFan00 (talk) 17:48, 17 November 2020 (UTC)

Wikisource:TemplateScriptEdit

The instructions here no longer work. I have

mw.loader.load('//en.wikisource.org/w/index.php?title=MediaWiki:TemplateScript/proofreading.js&action=raw&ctype=text/javascript');

In my common.js, but sometimes the relevant entries fail to appear on my sidebar. What changed in terms of the sidebar recently that means the load method here no longer works as designed?

Alternatively can someone provide me with a list of the EXACT setting to use in preferences so that things actually work as the documentation claims? Thanks. ShakespeareFan00 (talk) 08:58, 18 November 2020 (UTC)

This works for me. I get the following entries in the sidebar with this line in my JS:
Page tools ⛭
    Add header
    Add footer
    Clean up OCR
    Make reference
    Convert to small-caps
    Convert to uppercase
Perhaps you need to flush your cache? I have found Firefox recently to be very reluctant to flush the JS cache without opening Developer Tools (F12) and setting it to "Disable HTTP Cache (when toolbox is open)". Inductiveloadtalk/contribs 09:21, 18 November 2020 (UTC)

Flawed template logic...Edit

How are you supposed to check for extended usage of a templates additional parameters?

I added a line to {{Outside L}} and {{Outside R}} to do this, but it seems to be acting randomly, because it's putting into the relevant category, items I have checked do not use the additional parameters concerned.

Is the logic I've used flawed in some way, because I was trying to figure out if the code in these templates can be made simpler using templatestyles? Checking for actual usage is part of this. ShakespeareFan00 (talk) 15:48, 18 November 2020 (UTC)

Call for insights on ways to better communicate the work of the movementEdit

ELappen (WMF) (talk) 18:56, 18 November 2020 (UTC)

Is there really no way?Edit

Is there really no way to have a bot or shortcut link to automatically add archive.org pdfs as those proofreadable scans to wikisource with only a click or two? I'm looking at Author:Goldsworthy Lowes Dickinson which has links to the external source at Archive.org for each single work...yet none of them are on Wikisource and it seems like it would take hours to import all the source documents over and get them even just set up to be start being proofread. Wouldn't that be a tool worth creating to mass import such things? Peace.salam.shalom (talk) 05:08, 20 November 2020 (UTC)

@Peace.salam.shalom: There is such a tool: IA-Upload. It's still slightly fiddly, but combined with the "Fill Index" and "Import Pagelist" gadgets (enable in your settings), it's not too bad. I'm working on a way for users to provide me with a CSV file and I can run a batch job, but it's not there yet.
Some of the books have been uploaded to Commons already by Fæbot, for example File:Religion - a criticism and a forecast (IA religioncriticism00dick).pdf. With the two gadgets mentioned above, setting up an index isn't too miserable. Inductiveloadtalk/contribs 05:16, 20 November 2020 (UTC)
Oh, perhaps I am just confused then because there is not a link to see those scans (ie, Religion - a criticism and a forecast) on his userpage? There's not an automatic link [click here for the scans to be proofread] thing? Peace.salam.shalom (talk) 05:26, 20 November 2020 (UTC)
There is no bot to put links from enWS to IA. From an author page, feel that you can use {{ext scan link}} to add a link after the name of a work. No real value in a bot, as the detail at IA is never impeccable, never unique, full of variation. — billinghurst sDrewth 05:38, 20 November 2020 (UTC)
The scans were uploaded to Commons fairly recently by Fæbot, which doesn't create the Index pages for Wikisource. Formatting an Index page is virtually impossible to do using only the mediocre bibliographic data at the IA. Additionally, the page lists nearly always need some kind of manual intervention even in the best cases, and the scans need at least a cursory check for missing pages. With the two gadgets I mentioned above, most of the heavy lifting is done for you, but you still need to tidy it up manually in places.
Once an index page is created, you can change {{ext scan link}} to {{small scan link}}. Inductiveloadtalk/contribs 05:44, 20 November 2020 (UTC)
that being said, the cross project task of uploading works and transcribing them, and getting them found, is opaque and obtuse. it would be nice to have a query of works on IA and commons, that need an index, with a tool, or dashboard for an author page, so we could have a system, rather than rely solely on individual initiative. Slowking4Rama's revenge 02:08, 22 November 2020 (UTC)

Community Wishlist Survey 2021Edit

The 2021 Community Wishlist Survey is now open! This survey is the process where communities decide what the Community Tech team should work on over the next year. We encourage everyone to submit proposals until the deadline on 30 November, or comment on other proposals to help make them better. The communities will vote on the proposals between 8 December and 21 December.

The Community Tech team is focused on tools for experienced Wikimedia editors. You can write proposals in any language, and we will translate them for you. Thank you, and we look forward to seeing your proposals!

SGrabarczuk (WMF) 05:52, 20 November 2020 (UTC)

Hello, folks! My name is Ilana, and I'm the product manager for the Community Tech team. I would just like to clarify (in case there's any confusion) that the team will still work to address the Wikisource wishes from the 2020 survey. In other words, the 2020 Wikisource wishes will not be forgotten or neglected! In fact, we're already working on some wishes. The ebook export project is in progress, and we're begun research for the OCR project. For the other wishes, we'll also work on addressing them in the future as well. For this reason, there's no need to resubmit the 2020 wishes (from the top 5) in the 2021 survey. However, we look forward to reading and reviewing new Wikisource wishes in the 2021 survey. Thank you! --IFried (WMF) (talk) 00:36, 21 November 2020 (UTC)

Wikimedia meetEdit

Just a note that one of the available tools for Wikimedians is Wikimedia meet. Not certain whether we have an uncalled need or not for some face-to-face sessions or not. I do wonder whether offering some occasional tutorial sessions could support new users. What are people's thoughts? — billinghurst sDrewth 02:07, 22 November 2020 (UTC)

it would be good to have a monthly call with meta:Wikisource Community User Group. then we could organize responses to strategy, and wishlist. User:VIGNERON (at IFLA) did a wikicite / wikisource on youtube. youtube.com/watch?v=fxyVlAHWz38 (disappointed by your filter of youtube again) some refresh of tutorials would be nice, also. Slowking4Rama's revenge 16:47, 22 November 2020 (UTC)

Tech News: 2020-48Edit

17:19, 23 November 2020 (UTC)

IA Upload not workingEdit

The IA Upload tool has not worked for me for several days; it seems from its logs not to be working for many other people, if not everyone. I've raised a Phabricator ticket. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:39, 23 November 2020 (UTC)

Bot-y Things?Edit

I just noticed that not all the pages titled "Letter to..." are in Category:Letters and it seemed like the sort of thing a bot/automated task should be set up to not only do the backlog now, but keep up with in the future when somebody adds such a letter. Not all letters will be titled "Letter to...", but all works titled "Letter to..." or "Correspondence of/between..." can be safely assumed to belong in the category. But I have no idea where to find bot lists. Peace.salam.shalom (talk) 15:34, 24 November 2020 (UTC)

We would not typically put all subpages of a work into a category. We would usually do the top level of the work only for this example. If there a subpage of the work that was different from the parent, then we may do it. — billinghurst sDrewth 01:22, 26 November 2020 (UTC)

SpBot and section resolvedEdit

Does the SpBot take into account the template {{section resolved}} when archiving discussions from this page? --Jan Kameníček (talk) 14:55, 26 November 2020 (UTC)

I am asking because WS:Copyright discussions has a notice "SpBot archives all sections tagged with {{section resolved|1=~~~~}} after 7 days" at the top, while this page does not contain such a notice and some sections have not been archived yet although the template was placed there almost a month ago. I have placed the template to one of the sections recently so I would just like to know whether it is going to have any effect. --Jan Kameníček (talk) 07:54, 27 November 2020 (UTC)
@Jan.Kamenicek: The behaviour of SpBot is configured per-page with the template {{autoarchive resolved section}}. The two main parameters of interest here are age and timecompare. age defines the number of days after which SpBot considers a thread eligible for archiving. When making this assessment it refers to the value of the timecompare parameter: if it has the value resolved the bot determines the age of the thread from the timestamp in the {{section resolved}} template, and if it has any other value (including empty) it looks at the oldest timestamp in the thread (actually, I think that's a documentation error; I'm pretty sure it looks at the newest timestamp).
In practice this means threads on WS:CV are never archived, unless they contain {{section resolved}} in which case they are archived 7 days after the date in the template. Threads on WS:S are archived 31 days after the last timestamp in the thread (modulo the doc confusion mentioned above). So for the thread you were interested in, adding {{section resolved}} probably postponed archiving until 31 days after you added the template, rather than let it be archived 31 days after the last regular message in the thread. For this reason I usually use {{closed}} rather than (well, in addition to) {{section resolved}} for threads that have outlived their productive lifetime on WS:S. --Xover (talk) 10:31, 27 November 2020 (UTC)
@Xover: Thanks for explanation. Do I understand it right that the correct way of accelerating uncontroversial archivation is using {{closed}}? --Jan Kameníček (talk) 23:29, 27 November 2020 (UTC)
@Jan.Kamenicek: No, {{closed}} doesn't affect archiving either; but it signals that the discussion is over and no further input is desired. You can also use {{cot}} and {{cob}} to just collapse the section contents without the quasi-official connotation of {{closed}}. PS. Apologies for the tardy reply. --Xover (talk) 16:45, 1 December 2020 (UTC)
I see. However, what I originally wanted to achieve was speeding the archivation so that the discussion in the archives could be linked to because it does not make sense to link here as it is going to disappear from here after some time. So I will wait until the bot does its job. --Jan Kameníček (talk) 16:55, 1 December 2020 (UTC)
@Jan.Kamenicek: Ah. In that case, you could use a permanent link that points to the relevant section at a specific revision. It requires manually finding the revision id and constructing the link, which is really hard to figure out the first time and really tedious and fiddly every time you want to do it. I made myself a script to make it easier, but it's rather buggy in some places. If you want to try it out you can stuff mw.loader.load('//en.wikipedia.org/w/index.php?title=User:Xover/EasyLinks.js&action=raw&ctype=text/javascript'); into your common.js. No warranties, but it'll give you a link next to each section heading labelled "permalink", which when clicked puts a permanent link to that section on your clipboard. For example, here is a link to this section as it was right before I posted this message: SpBot and section resolved. You can also get non-permanent links to a section, and links to a diff (when viewing a diff). At some point I'll get around to polishing it up suitable for others to use, but it works well enough for my needs so I keep putting it off. :) --Xover (talk) 17:16, 1 December 2020 (UTC)
@Xover: Creating a link to the section at a specific revision is a very good idea. I think I found quite an easy way: I found and opened the revision, and then clicked on the title of the section in the Contents box, which generated the desired link in the browser’s adress bar ( https://en.wikisource.org/w/index.php?title=Wikisource:Scriptorium&oldid=10674984#Self-published_lectures ). Thanks very much for the advice. --Jan Kameníček (talk) 18:46, 1 December 2020 (UTC)

Moynihan report 1965Edit

I thought I might find the report written by Daniel Patrick Moynihan, commonly referred to as the Moynihan report 1965, and more formally as The Negro Family: The Case For National Action. the report is discussed briefly in his biography, and in more detail in w:en:The_Negro_Family:_The_Case_For_National_Action. Both articles have links, although I just added a discussion to the respective talk pages noting that the original link is dead, and proposing to change it with a working link. Most of the material is captured in Internet archive links, although a list of references available at the original is missing from the Internet archive. I'm bringing this up here for two reasons:

  1. Given that the work is public domain, and a very important document, it would seem appropriate to include it here
  2. I'm particularly interested in the "numerous tables and graphs" mentioned on the bottom of the first page of the document which are not shown at the Department of Labor site. I think those would also be useful — I'm hoping someone here has more experience with how to track them down and add them.--Sphilbrick (talk) 14:36, 30 November 2020 (UTC)
the us DOL link has only a web1.0 text dump [17]. not scanned at IA [18] here is a google books copy [19] (you could download it and upload at IA and commons) - Slowking4Rama's revenge 15:14, 1 December 2020 (UTC)
  • The Google Books scan looks good; Inductiveload, could you create a DJVU from this file (and remove the Google Books page at the beginning)? TE(æ)A,ea. (talk) 00:55, 2 December 2020 (UTC).

Wikidata descriptions changes to be included more often in Recent Changes and WatchlistEdit

Tech News: 2020-49Edit

17:45, 30 November 2020 (UTC)

United Nations Treaty SeriesEdit

Please see also c:Commons:Village pump/Copyright#The years of registration vs. publication of the United Nations Treaty Series since Volume 401. Perhaps we may host treaties registered in recent copyright-declared volumes only from different sources, like US governmental sources for American treaties. Volumes 1121, 1199, 1200, and 1237 were registered in 1979 to 1981 but published in 1996 or 1998 with copyright notices.--Jusjih (talk) 21:31, 30 November 2020 (UTC)

Algiers Accords - PD?Edit

Do we think this document is in the public domain? It is an agreement (though short of a treaty) between the United States and Iran, brokered by the Algerian government, with the document officially having been produced by the Algerian government. BD2412 T 06:45, 2 December 2020 (UTC)

Feedback requested on November update for Wikisource ebook export projectEdit

Hello, everyone! The Community Tech team is requesting your feedback on the recently posted November update for the Wikisource ebook export improvement project. Your feedback is very important to us. We want to know what you think of some work we have recently completed to improve the reliability of WS-Export and font support in various languages. Additionally, we want to know what you think of our proposed mockups to improve the download user experience. In that case, please do check out the updates, if you can, and share your feedback on the project talk page. Thank you! --IFried (WMF) (talk) 18:23, 2 December 2020 (UTC)