Announcements

Do you create PDFs on Wikimedia wikis?

Hi everyone, I’m looking for feedback from people who use the function to create PDFs on the Wikimedia wikis, which feels relevant for Wikisource. In short, the main technology we’re using to render them – OCG – is breaking down. The code is old, it’s difficult to maintain, and if we don’t replace it now we might suddenly find ourselves in a situation where we'd have to take it down without having planned to do so.

We have some plans for the future over at mw:Reading/Web/PDF Functionality. If you care about the PDF function, please head over there and tell us on the talk page if anything is missing, or if there’s something in there we shouldn’t spend our time and energy on. /Johan (WMF) (talk) 12:19, 18 May 2017 (UTC)

Proposals

Bot approval requests

Repairs (and moves)

Other discussions

15:44, 19 June 2017 (UTC)

Ship track upload as documentary source?

I'm about to receive a track of the ACX Crystal, recently involved in a collision in Japanese waters. Would this be proper to upload here as a "documentary source"? I expect it to be in a tabular format that can then be converted to a graphic, but not yet plotted as a graphic. - Bri (talk) 18:03, 19 June 2017 (UTC)

what is the license? if it is a document of tabular data, you could argue for PD in the US, but the pdf of the document would go to commons first. or do you want to upload here as "fair use"? Slowking4SvG's revenge 11:42, 20 June 2017 (UTC)
Further to this, we don't allow "fair use" on Wikisource, and we also don't allow reference material such as tables of data unless it is published as part of a complete source text. —Beleg Tâl (talk) 12:25, 20 June 2017 (UTC)
but we very well could, would, and should. given the propensity of commons to delete books in use, it is a matter of time. Slowking4SvG's revenge 19:01, 20 June 2017 (UTC)
If it's added to Commons as '.map' data, it'd be plotted automatically. Like commons:Data:Wikimedians.map for example. I'm not sure Wikisource is the place for pure data. Sam Wilson 12:30, 20 June 2017 (UTC)
Commons:Structured data is acceptable to be uploaded to Commons, usual copyright applies. I would not think that a track would be copyright as fact is not copyrightable. — billinghurst sDrewth 05:05, 24 June 2017 (UTC)

Problem with a pdf file

A pdf file has a problem! When I download it and I go to page 172 using Acrobat reader, I see the page but in the wikisource, no page is shown. This is the page address in fa.wikisource.org. Please help me to solve it. --Yousef (talk) 11:17, 22 June 2017 (UTC)

The page is visible to me. You need to purge your cache. Hrishikes (talk) 12:18, 22 June 2017 (UTC)

Search projects from this project now active in English Wikipedia

Just to let you know, as announced via mailing list service, English Wikipedia is now receiving search results of this project, Wikisource, intended to direct Wikipedia users to this project. Currently, an option to suppress the search results of this project from the English Wikipedia search system is proposed at Village pump's "proposal" subpage, where I invite you to comment. --George Ho (talk) 19:04, 22 June 2017 (UTC)

How do you contribute to Wikisource?

Hi everyone,

I have been proofreading a few pages here, but I feel like I don't understand really how this place works. There are many many projects started, some of them lingering for years. I don't even know how to find out how many books are finished, how many books are ongoing. It seems like a lot of people work for some pages on a book, alone, then very often give up, because this is a very long and sometimes boring task. Apart from a few discussions on the Current Collaborations, I don't see where people talk, so I don't feel like there is an active community. Am I missing a magical place where people discuss, exchange, organize?

A few years ago, I participated in PGDP, where there is a very active forum, with a thread for each project where the different proofreaders can exchange on the formatting or the difficulties to reach a consistent result, or even just share the most interesting/funny quotes of the books they are working on. There was also some specialized teams, like one named the gravediggers if I remember correctly, which focused on the oldest projects, or teams for texts on a specific topic, which could gang up on a given book at the same time. This was made possible by the existence of statistics at the book level, not only at the page level.

So:

  • Is there a lot of discussion and organisation going on somewhere I don't know (other talk pages? IRC? mailing-lists?)
  • Would you be interested in statistics at the project level? (e.g. list of projects with the progress percentage, so that we can quickly finish works almost done, or focus on the oldest ones). I think I could code something giving regular updates. Actually, does it exist in other wikisources?

Koxinga (talk) 20:12, 22 June 2017 (UTC)

This very page (the Scriptorium) is our central discussion forum. You've come to the right place! Discussions regarding a specific project are done on the Index talk page. Bigger projects are organized as WikiProjects. Other discussion forums and lists of places to contribute are listed at Wikisource:Community portal. I'll let someone else speak to statistics as I don't know much about that. The best place to contribute if you don't know where to contribute is probably the proofread of the month. —Beleg Tâl (talk) 21:02, 22 June 2017 (UTC)
dashboard for wikisource progress? yes please! the example that comes to mind is Wikisource:WikiProject DNB/Statistics and Wikisource:WikiProject DNB/Progress. but in general we are too disorganized to do actually reporting, except ad hoc. some tools to make project management & progress communication would be fine. we should really do a wish list, or you could write an idealab - quick grant, if you could write up your own scope. Slowking4SvG's revenge 22:20, 22 June 2017 (UTC)
Of course I know about the talk pages and the Scriptorium, but it is just so empty. There is no feeling of community here.Koxinga (talk) 22:43, 23 June 2017 (UTC)
The Special Pages link on the left-hand side gives you access to a lot of interesting information, and particularly List of index pages is the page to see if you want find projects at various stages of completion. — I think one of the strengths of English Wikisource is it (usually) allows you to start and work on all sorts of project autonomously, but that does result in a lot of unfinished projects and makes the community spirit a little hard to see at times. I've put up a lot of index pages that I'd like to work on "some day" and a couple of times I've come across one that someone has taken on and finished, which was extremely gratifying. — One thing I do to contribute is search for common scan errors and correct them. One of my favorites has been "thou earnest" for "thou camest". That's a good way to get a glimpse of a lot of interesting material. Anyway, I hope you'll be sticking around, and I agree that more community interaction would be a good thing! Mudbringer (talk) 01:34, 23 June 2017 (UTC)
To add to this, a lot of editors will add a list of the projects they're working on to their user page, so you can get an idea of what people are up to by looking there. Special:RecentChanges will also show what people are currently working on. —Beleg Tâl (talk) 11:54, 23 June 2017 (UTC)
Yes, that's exactly what I mean. It is very gratifying to see someone else working on the same project. On the opposite, I have been back after a hiatus of a year, to find that not a single page had been proofread in the meantime. I do work on some rather specific topics, with Chinese characters that might frighten some contributors, but still, this is rather disheartening.Koxinga (talk) 22:43, 23 June 2017 (UTC)
I enjoy contributing to wikisource, it's one of my favorite passtimes. I like the idea of adding works here and making them available for future generations. Maybe someone 100 years down the line will be reading some of the works we've been adding. I also like the idea of me being able to read works I've never read before and also at the same time making them available for other readers to read. But it has to be enjoyable for me, so I mainly work on subjects I'm interested in and as you mentioned I often might start a book and get disinterested, and then just forget about it. I don't care. This isn't a job, I don't have to contribute if I don't want to, I can wake up tommorow and never contribute to wikisource again and probably no one will ever notice. I don't want deadlines here, I have them at work. I like contributing here to get away from work and relax. So basically wikisource for me is something enjoyable to do in my free time and having to be forced to finish a work, or work on books we're not interested in just to get it done is the wrong way to go for me. Don't get me wrong, we should strive to get the works we're working on finished, but if we don't or can't who cares, someone else will probably get it done down the line. Jpez (talk) 11:26, 23 June 2017 (UTC)
I am not talking about setting deadlines or anything like that. It is fine if your motivation is entirely internal and you can work alone at your own pace. However, I do think we would get more contribution with more reporting on what is going on, what are the projects moving forward, what are the projects close to completion, etc.Koxinga (talk) 22:43, 23 June 2017 (UTC)
welcome to smaller wikis. there is less chatter and drama, and more work done. a little coaching (management) would be welcome. people tend to ask for help here, ad hoc, rather than systematic reporting; people team to get a project done. we could use a wikisource newsletter, or progress dashboard. if you could make some tools to report project progress semi-automatically, rather than by hand, that would be a big help. Slowking4SvG's revenge 15:19, 26 June 2017 (UTC)

08:41, 23 June 2017 (UTC)

License tags in Translation space

What is the best way to put license tags in Translation space? The original work needs an explicit license tag, but I'm not sure about the translation itself. I assume it will always be CC-BY-SA-3.0 and GFDL, but I've seen some editors explicitly release it into PD. Is this allowed? Should the CC-BY-SA-3.0/GFDL licenses be explicitly tagged? I've been tagging them explicitly, as below, but I just want to see if others have a better way.

{{translation license
| original = {{PD-old}}
| translation = {{CC-BY-SA-3.0}}{{GFDL}}
}}

Beleg Tâl (talk) 13:22, 23 June 2017 (UTC)

Our rider on saving is By saving changes, you agree to the Terms of Use, and you irrevocably agree to release your contribution under the CC BY-SA 3.0 License and the GFDL. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license. So that is what is applying for contributor work in Translation: ns. So until we update that, that is what it is. — billinghurst sDrewth 22:33, 23 June 2017 (UTC)

The Time Machine (Heinemann text)

Hrishikes has brought an issue to my attention, which I have looked into as well. This is a bit complicated, so I will summarize, then say more at length.

Summary: Our copy of The Time Machine (Heinemann text) is not the 1895 Heinemann text of the novel by H. G. Wells, but seems rather to be the 1924 revised "Atlantic" text included in an omnibus edition The Time Machine, The Wonderful Visit and Other Stories published by T. Fisher Unwin. [17] As H. G. Wells died in 1946, his works are in PD in the UK. The omnibus was printed in the UK in 1924, and does not seem to have had copyright renewal in the US. So it may be in PD in the US. Hrishikes has located a scan of the Heinemann text and started transcription. So, if our copy of the "not-Heinemann" (Atlantic) text is in PD in the US, then we need to move it to a new location and make room for the actual Heinemann text. But if it is not in PD, then it should be deleted. As an added wrinkle, the "not-Heinemann text" is a Wikisource Featured text.

Identity of the text located at The Time Machine (Heinemann text): It is easily seen that our current copy is not the Heinemann text. Compare the table of contents for the actual Heinemann text with the one on our current copy. The number of chapters and their presentation are completely different. The Heinemann text has 16 chapters with chapter titles, but our copy has 12 chapters without titles. Neither did the 1895 Holt text have 12 chapters. The earliest edition with 12 chapters seems to be the "Atlantic" text that was the result of a revision. The "Atlantic" text may be seen here in an electronic version that preserves the original pagination and page headers.

The Atlantic text and copyright: The "Atlantic" text was published as part of an omnibus edition of Wells' works in the UK in 1924. Details of that publication may be found here. I do not know whether the text was simultaneously published in the US, possibly under a different title, or whether copyright applied for at that time. However, a search has turned up no evidence of a renewal for that volume. If so, then it seems the copyright in the US for the Atlantic text has expired. The original text was published in 1895, so it would be PD in the US as well, and all of Wells' works entered PD in the UK at the beginning of this year, as it has now been more than 70 years since his death.

Proposed actions:

(1) Feedback and confirmation of findings thus far. Is our text the Atlantic text?
(2a) If our text is the Atlantic text, and in PD, then propose moving it to The Time Machine (Atlantic text), and then proofreading and transcluding the actual 1895 Heinemann text to The Time Machine (Heinemann text) from the scan Index:The Time Machine (H. G. Wells, William Heinemann, 1895).djvu begun by Hrishikes.
(2b) If our text is not the Atlantic text, or is but not in PD, then delete it and proceed with adding the actual Heinemann text from scan etc.
(3) Decide about Featured status for the text. (Let's wait on that discussion until we know whether we're following 2a or 2b).

Original discussion: User talk:EncycloPetey#The Time Machine (Heinemann text). --EncycloPetey (talk) 17:38, 23 June 2017 (UTC)

1- inclined to agree based on chapters, but could not find an internet archive version, or at hathi trust, and not near me at worldcat [18]
2- i would be inclined to keep both, and change the header data for the reprint. (is it Heinemann text, published by Atlantic?)
2- do not see a reason for deletion (although there is a Scribner 1924 edition)
3- we can have delisted featured, we should think about all the old versions not transcluded from page scans
4-- i imagine we will have more of this, as we research editions. (and as our scholarship improves) the metadata at internet archive is so bad, people could be easily confused. Slowking4SvG's revenge 17:54, 23 June 2017 (UTC)
It's not the Heinemann text. The two texts are completely different editions, even having a different numbers of chapters (16 versus 12). The concern over deletion is that, if this is a 1924 publication, and if copyright was renewed, this edition might not be in PD yet. My research didn't turn up anything, but someone else's search might do so. --EncycloPetey (talk) 18:33, 23 June 2017 (UTC)
if you did not find anything, that is good enough for me. under the current US copyright search, that is the best result you can get. there is no positive proof of non-renewal. we have to set the standard of "good faith search" even if there is a very small chance of facts emerging. this is the standard of hathi trust. Slowking4SvG's revenge 17:26, 24 June 2017 (UTC)
I'd prefer The Time Machine (1924) as the page name, but aside from that I agree with your assessment and support your proposed actions. —Beleg Tâl (talk) 18:20, 23 June 2017 (UTC)
Unless we can verify for certain that the text is specifically from a 1924 edition, I'd hesitate on adding a date to the filename. Doing so might require further changes to the name later, if research turns up additional information. But if we can verify that it is the "Atlantic text", from any edition of that text, then the proposed name will work regardless of the actual date. --EncycloPetey (talk) 18:33, 23 June 2017 (UTC)
  •   Comment It is now an edition of a work with an uncertain source, we could just delete it if it doesn't bring true value. With regard to its copyright status, that does not change whether it is a 1924 version, or not, the copyright will always be the original version. Any copyright in the remainder of the suspected publication will depend on each of the components, and the renewal aspects. — billinghurst sDrewth 22:30, 23 June 2017 (UTC)
    As far as I am aware, the 1924 edition was a complete revision of the text by Wells himself, and not merely an editorial version. Does that affect the possibility of copyright? --EncycloPetey (talk) 22:35, 23 June 2017 (UTC)
    If it wasn't published before 1923, and wasn't previously published in an authorized version in the US, the URAA would have restored it. It's hard to say where the line is legally between a non-copyrightable new version and copyrightable changes, but decent revision should do it. It will be out of copyright in the US in 2020.--Prosfilaes (talk) 01:19, 24 June 2017 (UTC)
    Expert opinion from H. G. Wells's The Time Machine: A Reference Guide (2004) by John R. Hammond, page 19:

In the original edition of The Time Machine, published by Heinemann in 1895, the text was divided into sixteen chapters, and each chapter was given a title. When Wells revised his novels for a collected edition in 1924, the Atlantic Edition, he retained the text of The Time Machine virtually unaltered but reduced the number of chapters from 16 to 12, eliminating the chapter titles.

Most modern editions follow Wells's revision in dividing the text into twelve chapters. In the discussion that follows chapter references follow this practice.

A comparison of the chapter divisions is as follows:

Heinemann   Atlantic
1 Introduction 1
2 The Machine 1
3 The Time Traveller Returns 2
4 Time Travelling 3
5 In the Golden Age 4
6 The Sunset of Mankind 4
7 A Sudden Shock 5
8 Explanation 5
9 The Morlocks 6
10 When the Night Came 7
11 The Palace of Green Porcelain 8
12 In the Darkness 9
13 The Trap of the White Sphinx 10
14 The Further Vision 11
15 The Time Traveller's Return 12
16 After the Story 12
  Epilogue Epilogue

As per above, Heinemann chapter divisions were original, but Atlantic chapter divisions are currently in vogue. "Virtually" no difference in text. So I propose that the text may be migrated to scan, with title unchanged, alongwith additional chapters. Two pages are missing in the scan, which I am going to fix by blank placeholders. The blanks may be proofread from the Atlantic text. Hrishikes (talk) 02:00, 24 June 2017 (UTC)

The disadvantage of that approach is that we will have no copy of The Time Machine with the chapter divisions that are now in vogue. If we can legally retain a copy of the Atlantic text, then we should do so for this reason. --EncycloPetey (talk) 02:03, 24 June 2017 (UTC)
Wells's books are PD-UK. But the policy here is PD-US. Non-US texts need not have copyright registration/renewal in the U.S., the copyright is restored by the URAA for 95 years after publication. So we have to assess whether modification of chapter divisions, without alteration of text, amounts to significant change, attracting copyright. If the change is deemed as significant, then we cannot retain this text. Anyway, reduction in chapter number and elimination of chapter titles in currently-in-vogue version of the work may be mentioned in the header note, that should suffice.
P. S. It seems that the Atlantic edition was published in U. S. in the same year (1924) by Charles Scribner's Sons (details at http://www.isfdb.org/cgi-bin/pl.cgi?614641) without copyright notice/renewal. Hrishikes (talk) 03:20, 24 June 2017 (UTC)
Adding chapter names might have been copyrightable, but removing them wouldn't be, and splitting a few chapters in two pieces wouldn't be either. I don't know whether that copyright renewal would have been needed, since it's 30 days of first publication, but the changes don't seem copyrightable.--Prosfilaes (talk) 00:32, 25 June 2017 (UTC)
This site gives a date of October 15, 1924 for the first two volumes in the Atlantic Edition of The Works of H. G. Wells, which includes the text in question. —Beleg Tâl (talk) 01:52, 27 June 2017 (UTC)

Proposed action:

Given that: (a) the original work is PD in both UK and US, (b) the "Atlantic" text seems not to differ substantially except by removal of chapter titles and positioning of breaks, I propose we take the following actions:

(1) Move The Time Machine (Heinemann text) to The Time Machine (Atlantic text) to preserve this version.
(2) Add to the empty The Time Machine (Heinemann text) the front matter from the 1895 scan.
(3) Paste into each chapter subpage the relevant Atlantic text, then split-and-match to the Page namespace of the scan.
(4) Proofread the result against the Heinemann text scan, keeping alert for differences.
(4a) If proofreading demonstrates that the Atlantic text is indeed identical or inconsequentially different from the Heinemann text, then we keep both.
(4b) If proofreading reveals significant editorial changes, we can then delete the Atlantic text at its new location, perhaps moving a copy to Wikilivres, and restoring it 2020 when the US copyright would expire.

--EncycloPetey (talk) 00:45, 25 June 2017 (UTC)

Agreed. I don't think copyright will matter, anyway, it is PD-US-no notice. Additionally, I propose that the header note should mention metadata of this edition, including UK publication by Unwin and US publication by Scribner. And the Featured Text status should move to this new location of the Atlantic text. Hrishikes (talk) 02:17, 25 June 2017 (UTC)
It's only PD-US-no notice if it was published in the US within 30 days of first publication in the UK. Otherwise the copyright (if any) was restored.--Prosfilaes (talk) 03:41, 25 June 2017 (UTC)
i do not believe we have deleted a work based on URAA, so you may not want to open that can of worms, given the WMF legal advice. Slowking4SvG's revenge 14:37, 26 June 2017 (UTC)
@Slowking4: URAA-based deletion is a regular feature here. Premchand's Idgah was deleted under URAA provision, and later restored when it was proved that it was PD-India on URAA date. The works of Jibanananda Das were shifted to Wikilivres under URAA provision. Same with Sokoli Tomari Iccha and Naya Kashmir. There are many more examples. Non-US works are regularly deleted here when it is found that they were not PD-source country on URAA date. The WMF legal advice you referred to is for allowing foreign works that are PD-source country on current date, not merely URAA date. On that advice, Commons has stopped deletion of works that were not PD-source country on URAA date. This practice has not yet started here. If it starts, then the works of Jibanananda Das will need to be restored. Adopting this policy here is risky. You will do well to remember the direct deletion of Anne Frank's Diary by WMF in Dutch Wikisource, overriding the local community, based on URAA. Hrishikes (talk) 17:00, 26 June 2017 (UTC)
this case is very clearly PD not renewed. what evidence do you need? do you want a transcribed catalog of copyright entries?
sorry to hear you are propagating the URAA hysteria. let the restorations begin. i remember that about Anne Frank, why don’t you let me upload it here as fair use, since it is PD in Australia, and i will take the risk. i do not think that the plantiff will risk a DMCA takedown given w:Lenz v. Universal Music Corp. the federal judges are very consistent, and i have the $10k ante for federal court, don’t need any EFF help. Slowking4SvG's revenge 22:32, 26 June 2017 (UTC)
In order to state clearly this is "PD not renewed", we would need evidence that the edition was registered for copyright in the US within 30 days of the UK publication. Lacking evidence for that, we cannot say for certain this work falls under PD not renewed. If the original copyright was not filed in the US, or was not filed in 30 days, then the edition may retain copyright under URAA. That's rather the whole point. We need evidence of the original copyright filed and meeting the conditions, and we still need to verify that the text was not substantially altered. If no copyright was filed at the correct time, and if the text is substantially altered, this edition may still be under copyright. --EncycloPetey (talk) 22:47, 26 June 2017 (UTC)
If it was published with permission of the copyright holder within 30 days in the US, it's treated as a US work and is out of copyright for lack of notice as well as lack of renewal. If it wasn't an authorized edition, or it was more than 30 days after the UK edition, then any new copyrightable aspects will be under copyright.
Honestly, this seems like a bit much. There's no real evidence that's anything copyrightable here, and if there is, there's three years left on its copyright. Someone should split and match it against the old scans, but marginal copyright questions like this shouldn't be that much of a concern, IMO.--Prosfilaes (talk) 04:42, 27 June 2017 (UTC)
I'm inclined to agree with this, especially if we aren't able to determine whether the two publications were 30 days apart. By the time we have all the information we need to know whether it is subject to URAA or not, the copyright may well have already expired. —Beleg Tâl (talk) 05:25, 27 June 2017 (UTC)
registration date is here - Oct. 17, 1924 [19] Slowking4SvG's revenge 14:57, 27 June 2017 (UTC)
Anne Frank's Diary doesn't belong here, since the English translation will be in copyright until 2045 (in the US), and the translator was alive as of 2013. Feel free to bring it up with Commons or nl.Wikisource.--Prosfilaes (talk) 04:42, 27 June 2017 (UTC)

Disambiguation quandary

The work Once a Week is a literary magazine, but it shares the title with a book by Author:A. A. Milne.

Ordinarily, we would move Once a Week to something like Once a Week (magazine), and use the base name for disambiguation. But the current title is a literary magazine that already has multiple subpages for its series, volumes, and articles. A move would permanently extend the filename of all of the subpages, and require editing all of the links within and between these pages, both in headers and in the Page: namespace.

In this instance, where there is a multi-volume literary magazine involved, would it make more sense to set the disambiguation page at Once a Week (disambiguation), and leave the magazine where it is? --EncycloPetey (talk) 19:19, 23 June 2017 (UTC)

I'm willing to use AWB to disambiguate properly on the magazine. However, is the Milne work being added imminently? If not, there is no need to disambig yet. —Beleg Tâl (talk) 19:33, 23 June 2017 (UTC)
Although the Milne book is not being done yet (there is a good scan at IA [20]), the literary magazine is actively and rapidly growing on Wikisource each day. The longer we delay, the more moves and changes will have to be made. --EncycloPetey (talk) 19:41, 23 June 2017 (UTC)
That's a good rationale. I'll move it over when I'm on my other PC. —Beleg Tâl (talk) 19:54, 23 June 2017 (UTC)
Just to note that the articles are being created as mainspace base pages rather than subpages of the issue. e.g. The philosophy of advertising. Beeswaxcandle (talk) 20:09, 23 June 2017 (UTC)
Good to know. I'll move them to the proper path while I'm at it. —Beleg Tâl (talk) 20:12, 23 June 2017 (UTC)
The Mainspace articles probably ought to be subpages within series, volume, etc., but with redirects left from the Main namespace. I was looking into making those moves when I discovered the disambiguation issue, and decided it ought to be taken care of first. --EncycloPetey (talk) 20:16, 23 June 2017 (UTC)
Agreed. —Beleg Tâl (talk) 20:29, 23 June 2017 (UTC)

Facsimiles of older United States Reports post Google Books' typical full view cut off

Anybody know where these might be found? Prosody (talk) 19:20, 24 June 2017 (UTC)

These volumes are already present at {{List of United States Reports scanned volumes}}. Are you wanting something additional? Hrishikes (talk) 23:47, 24 June 2017 (UTC)
I was unclear, sorry. There are 564 volumes now, and Google Books only has facsimiles publicly available for US users for ones published before ~1920s (not sure what their copyright restriction policies are for users in other countries). Since asking I've found that Internet Archive seems to have some more. Prosody (talk) 17:06, 25 June 2017 (UTC)
the National Archives has it on microfilm through 1997 https://www.archives.gov/research/guide-fed-records/groups/267.html let’s see if i can find a digital copy at citizen archivist. Slowking4SvG's revenge 23:14, 25 June 2017 (UTC)
can’t find a systemic digitization. we have US govt documents, but they are haphazard. maybe a project with a sweep of the scans available would be a start. we have a few of these large projects that are stalled because the scans are crummy and it is so humongous. Slowking4SvG's revenge 01:32, 28 June 2017 (UTC)

15:38, 26 June 2017 (UTC)

A word about clearing the cache and page refresh

We are not aloneIneuw talk 19:30, 26 June 2017 (UTC)

How to see edit history on a whole text

Is it possible to see the edit history of a whole text? I can see the changes made in the last 30 days through selecting "On Watchlist" in the general Wikisource "Recent Changes" page. I would like to look back and see if anyone or any bot has been working on the project I have been working on, namely An_Exposition_of_the_Old_and_New_Testament_(1828). PeterR2 (talk) 09:31, 27 June 2017 (UTC)

@PeterR2: I don't sure that I understand what do you mean on saying to see the edit history of a whole text, but if you open the page An Exposition of the Old and New Testament (1828), and then click on the link "Related changes" which is in the left panel (in the section "Tools") — is this that one what do you need? The page opened by this way would show edits made on both either of the viewed page or its subpages (or also on other pages related to the main page), so you could see the edits on the whole text of the work (since the whole text of the work consists of the main page combined with all of its subpages). P.S. Sorry if I wrongly understood your help request. --Nigmont (talk) 21:16, 27 June 2017 (UTC)
I would love to see an option on the watchlist to automatically watch all the subpages of a given page. There are some mediawiki extensions doing that, was the possibility already discussed here? Koxinga (talk) 21:57, 27 June 2017 (UTC)
There is a gadget (although, I can't find it right now because I can't remember what it was called) for watching all pages in a category. There was an idea earlier this year to extend it to cope with following all pages linked on an Index page, but I don't think that bit was finished. As for seeing all history of a work, I think Special:RelatedChanges is the only way, and that has some limitations (mainly that it only goes back 30 days, because it's using data from RecentChanges). Sam Wilson 22:57, 27 June 2017 (UTC)

15:31, 3 July 2017 (UTC)

Pagelists

Anyone want to finally clear this backlog? There are some I don't feel happy working with for copyright reasons.ShakespeareFan00 (talk) 14:10, 4 July 2017 (UTC)

Join the strategy discussion. How do our communities and content stay relevant in a changing world?

Hi!

I'm a Polish Wikipedian currently working for WMF. My task is to ensure that various online communities are aware of the movement-wide strategy discussion, and to facilitate and summarize your talk. Now, I’d like to invite you to Cycle 3 of the discussion.

Between March and May, members of many communities shared their opinions on what they want the Wikimedia movement to build or achieve. (The report written after Cycle 1 is here, and a similar report after Cycle 2 will be available soon.) At the same time, designated people did a research outside of our movement. They:

  • talked with more than 150 experts and partners from technology, knowledge, education, media, entrepreneurs, and other sectors,
  • researched potential readers and experts in places where Wikimedia projects are not well known or used,
  • researched by age group in places where Wikimedia projects are well known and used.

Now, the research conclusions are published, and Cycle 3 has begun. Our task is to discuss the identified challenges and think how we want to change or align to changes happening around us. Each week, a new challenge will be posted. The discussions will take place until the end of July. The first challenge is: How do our communities and content stay relevant in a changing world?

All of you are invited! If you want to ask a question, ping me please. You might also take a look at our the FAQ (recently changed and updated).

Thanks! SGrabarczuk (WMF) (talk) 14:53, 5 July 2017 (UTC)

Wikilivres is now Bibliowiki

Wikilivres has moved and rebranded; they are now Bibliowiki and are located at https://biblio.wiki . Our internal references to Bibliowiki need to be updated.

  • Documentation needs to be updated (I can do this, albeit it may take a while for me to get to it).
  • The interwiki map for [[wikilivres:foobar]] needs to be updated to point to the correct location, and [[bibliowiki:foobar]] should be created as a preferred alternative.
  • Probably other stuff I haven't thought of.

Beleg Tâl (talk) 15:23, 4 July 2017 (UTC)

wikilivres has been redirected and bibliowiki has been created in the global interwiki map. I suggest moving the template to the new name, and updating as necessary. — billinghurst sDrewth 12:30, 8 July 2017 (UTC)

15:07, 10 July 2017 (UTC)

Per project statistics

Following my previous post about progress statistics by project, I decided to do some analysis myself. Based on the latest database dump, I looked at the Page: namespace and only counted the edits which change the status of a page.

It is possible to find many interesting tidbits of information from the different projects. For example:

However, it is mostly interesting to check the status of the backlog. For example:

What I wanted mostly was to know on which projects people are currently working. Dumps are not the most appropriate way to go about it as we miss a few days, but it is possible to know what happened in June using the latest dump from July 1st. In June, 419 projects have been edited (i.e. at least one page changed status), the most active being:

Editions by project in June 2017 (as of July 1st)
Index name Index status Pages validated Pages proofread Pages empty Pages remaining Number of pages modified Number of revisions Number of Authors
Index:Travels in Mexico and life among the Mexicans.djvu To be proofread 228 396 62 0 667 910 4
Index:Tarzan of the Apes.djvu Validated 407 0 17 6 410 769 3
Index:Thoreau - His Home, Friends and Books (1902).djvu Validated 346 0 38 0 384 745 15
Index:The Shaving of Shagpat.djvu Validated 306 0 20 0 308 611 2
Index:Ballantyne--The Battery and the Boiler.djvu Proofread 116 316 16 4 432 475 2
Index:The Novels and Tales of Henry James, Volume 1 (New York, Charles Scribner's Sons, 1907).djvu Validated 550 0 18 0 455 455 2
Index:The Novels and Tales of Henry James, Volume 2 (New York, Charles Scribner's Sons, 1907).djvu Validated 564 0 14 0 434 434 2
Index:The Bostonians (London & New York, Macmillan & Co., 1886).djvu Validated 451 0 13 0 430 430 1
Index:Cuthbert Bede--Little Mr Bouncer and Tales of College Life.djvu Proofread 48 256 14 0 311 425 3
Index:Royal Naval Biography Marshall sp4.djvu To be proofread 14 420 18 29 383 385 2
Index:Maud Howe - Atlanta in the South.djvu To be proofread 7 339 10 6 348 352 2
Index:Morley--Travels in Philadelphia.djvu Proofread 203 69 16 0 255 347 2
Index:Ballinger Price--Us and the Bottle Man.djvu Validated 162 0 14 0 176 338 4

Is there any interest for this kind of statistics and analysis? I understand Wikisource is currently driven by very dedicated users who start and often finish a work all by themselves. However, for a more casual editor, who wants to simply proofread a few pages and see a complete book including his work without having to wait for years, this could be a good extension to the proofread of the month (which is clearly visible in the table above!).

Technical description: I parsed a dump of the database to extract each project (based on the index pages), each page (based on the page namespace) and each revision changing the status of the page (not proofread, proofread, validated, etc.). The link between the page and the project is done by looking at the page name. This approach means I don't deal well with all the projects where the page is not a subpage of the index (there are 8769 of these). I also extracted the number of pages of the file, in order to take into account pages not yet created (I did not find how to get this data from the database directly, I had to scrap the HTML of the commons page).

Koxinga (talk) 08:45, 9 July 2017 (UTC)

From the perspective of "completeness" of works, we are interested in works that are nearly proofread, or nearly validated, that have not been edited for a period of time, so we can put resources to them. They are cheap wins with true value. If you are looking to see missing/non-created pages of a work, then you probably want to get a count of pages from the File: and compare that with the number of subpages of Index:. That would be a neat comparison as that would be another indicator of near completeness.
The other factoids, are interesting trivia, though I am not sure that they are particularly enlightening for the site, or our work — though I could just be considered a boring unexcitable, unromantic, task-focused fart. Noting that the stats about projects doesn't consider our multiple volume projects (EB1911, DNB, DMM, +++). Thinking of what would be useful: numbers of Index: works with counts for images missing, score missing, etc. so we could focus efforts, or promote efforts to assist completion. Numbers of edits on works is not relevant, though maybe date from creation to validation may have some social interest, though even that has dodginess of the work has advertising. We already track our validated and proofread works, and try to keep on top of transclusion status. [As said I may have the wrong focus for what is interesting to the trivia buffs.) — billinghurst sDrewth 13:19, 9 July 2017 (UTC)
Noting that pages remaining (not proofread) can be due to works having their advertising pages remaining, eg. Ballantyne's work above, so for a work like that, it has been marked as proofread (important), and we are tracking that its advertising is not done by a category. So pages unread in a proofread or validated work; whereas pages unread where small in a work not proofread is interesting. We are a complex beast. :-) — billinghurst sDrewth 13:25, 9 July 2017 (UTC)
Finding a work that had no pages remaining (ie. nothing to be proofread) for a work marked as "not proofread" is very useful as it enables us to review and reclassify as required by the review. — billinghurst sDrewth 14:03, 9 July 2017 (UTC)
Thank you for your comments. A few answers:
  • The trivia was to show the different possibilities, but I mostly aim to do something useful for project tracking and motivation of the different users. I know that at least for me, it would be motivating to see which works are being actively worked on, so that I can see progress when I come back to it, I know I can ask questions and exchange about the project, etc.
  • Yes, I use the actual number of pages of the uploaded file, so I can find the pages not yet created, I mostly consider them the same as the "not proofread" pages but it can be separated if needed.
  • I don't trust the "proofread", "validated", etc. flag in the index. It is manually set, so there can be mistakes in one direction or another. That's why I think it is useful to compare it to the actual situation of each page.
  • It is possible to remove the advertisement pages from my analysis, based on the <pagelist> tags, but we need to define a consistent marking for them. I saw some adv, adv., advt, advert (with a bonus "index to advert"), advertisement. Do we allow all of these or to we try to normalize?
  • I can take into account multi-volume projects and group them together, by looking at the Volumes part of the index page, especially the Category:Scanned volume navigation templates, I will look into it.
Koxinga (talk) 19:43, 9 July 2017 (UTC)
There is something wrong with the information gathered. Index:Popular Science Monthly Volume 12.djvu was completely validated a long time ago, and and the proofreading of Index:Travels in Mexico and life among the Mexicans.djvu was also completed, perhaps at the beginning of this month. — Ineuw talk 04:41, 12 July 2017 (UTC)
My analysis is based on a database dump, using the most recent one from July 1st. At the time of this dump, Page:Popular Science Monthly Volume 12.djvu/430 was not yet validated, it has been done after I posted this message. For Index:Travels in Mexico and life among the Mexicans.djvu, I did say that there was 0 page remaining, however, at the time of the dump, even if all the pages had been proofread, the index status was still "to be proofread". It has also been changed just after I posted this message. If there is an interest, it would be possible to use the recent changes to update the data more frequently, but judging from the lack of response here, it does not seem worth it.Koxinga (talk) 01:12, 13 July 2017 (UTC)
Thanks for clarifying. It's interesting. — Ineuw talk 09:50, 13 July 2017 (UTC)

Using Template:SIC with incorrect punctuation?

I have an unclosed bracket ("parenthesis" for any Americans reading this) in the first paragraph of the commentary on Chapter 14 at Page:An_Exposition_of_the_Old_and_New_Testament_(1828)_vol_1.djvu/125. I can't work out how to show that the first comma after the opening bracket should be a closing bracket, as shown in other editions from the 18th and 19th centuries. --PeterR2 (talk) 08:07, 11 July 2017 (UTC)

My own preference is not to mark anything in these situations. I just replicate what the printed text says. Beeswaxcandle (talk) 08:34, 11 July 2017 (UTC)
I guess people do these things for different reasons. I am working on this because I want to contribute to a reliable online text of a good edition of Matthew Henry's Bible commentary. The only existing one, which is used in various mobile phone apps, is from an unknown edition/editions and therefore not possible to proofread.

--PeterR2 (talk) 08:23, 12 July 2017 (UTC)

If I think it's important to show what seems to be the correct punctuation, sometimes I include the word attached to the punctuation to make it a little more visible. So if I understand correctly the place you're talking about, you could try this: {{SIC|history,|history)}}, which results in: history, — is that more or less what you had in mind? Mudbringer (talk) 10:04, 12 July 2017 (UTC)
I agree, and have occasionally done the same as Mudbringer, when I was concerned about the authenticity of the text. For some texts, it's not worth noting, and in some cases it's actually a printing / scanning issue. I've come across scans where there ought to have been a period at the end of a line, but none is visible in the scan, or where the scan shows a period, but it ought to be a comma. In some of these cases, I have had access to a printed copy, and could see the impression of the period, or the slight bit of ink starting a comma. Sometimes the ink isn't properly distributed by the printer, or the punctuation type was defective at the original press. In those situations, it's not worth marking. --EncycloPetey (talk) 17:36, 12 July 2017 (UTC)
Or just stick the correct punctuation inside a <includeonly> so it displays per the scan in the Page: ns, and it displays corrected in main ns. Stick an html comment in place if you really need to have a comment. I would not normally use {{SIC}} to correct punctuation, it pretty much defeats the purpose if it is a necessary piece of punctuation. — billinghurst sDrewth 12:47, 15 July 2017 (UTC)
In the case PeterR2 brought up, it would be necessary to have history<includeonly>)</includeonly><noinclude>,</noinclude> since there's also a comma that needs to be suppressed in the transcluded text. If this is an approach often taken on Wikisource, wouldn't it be better to have a template to do this, to make it clear that this is a permissible option, and make it possible to find places where this has been done? Mudbringer (talk) 17:59, 15 July 2017 (UTC)

Style changes?

When I used to edit index pages like this:- Index:The Atlantic Monthly, Volume 18.djvu

The field boxes USED to be in monospace. They aren't currently meaning that options overrun.

Is this a style change on Mediawiki, or a local configuration issue with an updated Firefox? ShakespeareFan00 (talk) 10:59, 13 July 2017 (UTC)

sure looks like a style change in media wiki. index page editing interface change. is there any documentation / notice for this? Slowking4SvG's revenge 11:20, 13 July 2017 (UTC)
[Wikisource-l] Tech details
GSoC Proposal[2017]: Improvements to ProofreadPage Extension and Wikisource
Weekly reports: Improvements to ProofreadPage Extension and Wikisource, Zdzislaw (talk) 20:50, 13 July 2017 (UTC)
In summary, there was no notice where the ordinary user of enWS would see it. It's a pity the GSoC Proposal was called "improvements" when so far it's resulted in a new user right that had to be reversed almost immediately after implementation and this uglification of something that worked just fine. Beeswaxcandle (talk) 07:51, 14 July 2017 (UTC)
hmmm, I do not think so... page editor interface will be switching over to OOUI soon - is now ready for deployment to Wikimedia wikis, see: The_Atlantic_Monthly/Volume_2/Number_2/The_Autocrat_of_the_Breakfast-Table. So, adaptation of "the rest" of the proofread extension ui is also required. It would rather be nice to say "thank" that someone wants to do... and get some skins for improvement... Zdzislaw (talk) 16:34, 14 July 2017 (UTC)
As I am no UI guy, I miss why a switch to OOUI implies also a style change, but never mind, was just curious. Just a comment: the style of the summary section in the pages above is different for the Index and the Main ns.— Mpaa (talk) 17:21, 14 July 2017 (UTC)
After switching to OOUI will be the same - Index:The_Atlantic_Monthly,_Volume_18.djvu, Zdzislaw (talk) 17:34, 14 July 2017 (UTC)
You mean they will both be glitchy? I'm not getting arrow images by some of the drop-down items (like "Progress") and it looks as though the limitations on values for Year of Publication have gone away. It would have been nice if the change had been explained clearly to non-tech-minded users beforehand, or better still, if it had been tested on non-Wikipedia projects like Wikisource. I hate to think what this will do to Wiktionary editing. --EncycloPetey (talk) 21:21, 15 July 2017 (UTC)

22:59, 17 July 2017 (UTC)

Future changes previously mentioned: TemplateStyles

Mentioned first above at #Tech News: 2017-24

Hi all,

we'll enable TemplateStyles (mw:Extension:TemplateStyles and mw:Help:TemplateStyles) tomorrow on mediawiki.org, wikitech.wikimedia.org and some test wikis. (Today for those of you in UTC or later time zones.)

TemplateStyles allows editors to add complex CSS to templates with the help of a <templatestyles> tag. This makes template maintenance easier, lowers the barrier of access (previously you had to be an admin to be able to add new CSS) and empowers editors to create more user-, mobile- and print-friendly templates.

For plans for rolling it out to content wikis see phab:T168808.

Gergo Tisza, wikitech-l mailing list

Another question about copyright.

Looked up a short microfilmed article on the New York Times, downloaded the page as .pdf, and typed the contents into a text file (pdf copy and paste was blocked). At the bottom of the microfilmed article was the following claim: Published: February 12, 1877 Copyright © The New York Times. Is it or is it not in the public domain? — Ineuw talk 02:17, 16 July 2017 (UTC)

It's public domain. The assertion of copyright there is at best imprecise (at worst fraudulent). If it was published before 1923 (anywhere in the world) it is safe to assume it's public domain in the US. And since this was first published in the US we need not take into account any differing copyright terms in other countries (i.e. the annoying rules for magazines published in the UK). --Xover (talk) 14:05, 16 July 2017 (UTC)
Anyone can make a claim about anything... Whereas the reality is that I got the most votes at the recent US election... But it doesn't necessarily make it factually correct. — billinghurst sDrewth 14:11, 16 July 2017 (UTC)
do not know why experienced editors keep asking. there is a reflexive naivete towards the false boilerplate that institutions persist in. how many items have been deleted on false claims? it shows that the copyright determination is not balanced, but tilted sharply toward delete. Slowking4SvG's revenge 21:48, 16 July 2017 (UTC)
@Slowking4: Experienced editor, maybe. Knowledgeable, I doubt that. I will bring up the matter with The New York Times, since I love to bother them occasionally. — Ineuw talk 16:21, 18 July 2017 (UTC)
bother them all you want, their legal department does not care. you could also bother Getty, MacArthur Foundation, National Portrait Gallery, London, Louvre, Smithsonian Institution, etc, etc. [32] Slowking4SvG's revenge 12:56, 19 July 2017 (UTC)

Schedules are WRONG, the columns don't match the info, please check my comment in February!

Hello, the schedules haven't been corrected, I advised you about this back in February!

Regards, Neil, South Africa

@ShakespeareFan00: is this you? The only discussion I can find on the Scriptorium regarding broken schedules is Wikisource:Scriptorium/Archives/2017-03#Enough! from March 1, and I thought that was figured out. -- Otherwise, Neil, we'll need more info. What are you referring to? Did you possibly mean to post that comment elsewhere? —Beleg Tâl (talk) 12:27, 21 July 2017 (UTC)
The only other message from Neil that I can find was on my talkpage a month ago: User talk:Beeswaxcandle#Registration Acts of 1836, England. I had, and have, no idea why I was messaged directly about it. Beeswaxcandle (talk) 19:52, 21 July 2017 (UTC)
Hmm, we do have 1836 (33) Registration of Births &c. A bill for registering Births Deaths and Marriages in England and another for marriages. — billinghurst sDrewth 08:58, 22 July 2017 (UTC)
and 1836 (34) Marriages. A bill for Marriages in England though the schedules look okay to me at first blush. — billinghurst sDrewth 09:01, 22 July 2017 (UTC)

15:57, 24 July 2017 (UTC)

16:43, 24 July 2017 (UTC)

Marking multiple pages as proofread

Is there a way to mark pages proofread (or validated), without opening each page one by one, edit, mark, save, etc.? I'm reading a book offline. The text is currently marked as "not proofread", but it is much better than than raw OCR, so the majority of the pages need no change. I'd like to, e.g., read a couple of chapters, make the corrections needed in the few affected pages and then mark all pages from n to m as "proofread". Any way to do this? Jellby (talk) 14:16, 25 July 2017 (UTC)

I doubt there is a way, because it would be too easily abused :) ... but the process can be sped up. For Windows, you can open a whole series of pages from the Index page by control-clicking with the mouse, then each page you can begin editing with alt-shift-e, immediately jump to the summary box at the bottom with alt-shift-b, click the proofread circle, then either click the publish button or publish the page with alt-shift-s. That shouldn't take more than 20 seconds per page. Perhaps someone with Autohotkey has figured out how to mark a page as edited with a single keystroke. I'd personally recommend clicking into the text box and giving one last look with a spellcheck to pick up any easy errors you might have missed. It's amazing how easy it is to overlook stuff. Mudbringer (talk) 06:53, 26 July 2017 (UTC)
It's a fairly straight-forward piece of JS too. Basically, from editing, you just need to do something like $('span.quality3 input').click(); to mark the page proofread and then do a click() on the "Save" button. This is how you could add to the TemplateScript sidebar and give it a "Alt-Shift-Q" (Alt-Shift prefix depends on browser+platform) shortcut:
// Quick proofread tool for preloaded proofread text

"use strict";

var quick_proofread = {
    
    init: function() {
        var self = this;
  
        // only care for editing in the Page: namespace
        if ( !(mw.config.get( 'wgAction' ) === 'edit' || mw.config.get( 'wgAction' ) === 'submit' )
            || mw.config.get( 'wgNamespaceNumber' ) !== 104 ) {
            return;
        }

        console.log("Installing Quick PR tool");

        $.ajax('//tools-static.wmflabs.org/meta/scripts/pathoschild.templatescript.js', { dataType:'script', cache:true }).then(function() {

            // Add tool to sidebar
            pathoschild.TemplateScript.add(
                [
                    {  	name: 'Quick proofread', 
                        position: 'cursor',
                        accessKey: 'q',
                        script: function(editor) {
                            self.mark_proofread();
                            self.save_page();
                        },
                    },
                ],
                { category:'page', forNamespaces:'page' }
            );
        }); // end ajax
    },
    
    mark_proofread: function() {
        $('span.quality3 input').click();
    },
    
    save_page: function() {
        var summary_text = "Quick proofread from offline proofreading";
        $('#wpSummary').val(summary_text);
        $('#wpSave').click();
    },
};

quick_proofread.init();
All you have to do then is wait for the tool to appear in the toolbar, make sure the text is what you expect and press Alt-Shift-Q to mark proofread and save the page with a boilerplate summary.
In theory you can do it all with an API call too, but you'll probably want to use a bot framework like PyWikiBot, as you'll have to deal with edit tokens and stuff manually otherwise. If you do that for a more automated hands-free method, you may need a bot flag, but the advantage is you can load the text from your computer directly, and it's fire-and-forgot once written. Inductiveloadtalk/contribs 10:17, 26 July 2017 (UTC)

Accessible editing buttons

--Whatamidoing (WMF) (talk) 16:56, 27 July 2017 (UTC)

Template:Article needs refactoring

If someone has the time, the template needs to be refactored as its mix of spans and divs seems skewiff. It is also very specific with a general name, and I think that needs some tidy up, or other explanation would be worthwhile. — billinghurst sDrewth 04:54, 30 July 2017 (UTC)

With a whole two (2: count them) main-space consumers who the hell cares. Substitute and delete it? 110.146.167.230 05:38, 30 July 2017 (UTC)
Agree, this isn't a useful template. —Beleg Tâl (talk) 19:49, 31 July 2017 (UTC)

21:45, 31 July 2017 (UTC)

Validating question

I'm running across something I've never seen before, and don't understand. So far, in two books. Please see Page:Lives of Fair and Gallant Ladies Volume II.djvu/303 which has coding for sections, and Page:The Life of Mary Baker G. Eddy.djvu/57. This has coding for part 1 and part 2. I don't see these divisions on the scanned images, so I don't know what the editors are doing? Can anyone clarify for me, please? Maile66 (talk) 18:13, 31 July 2017 (UTC)

If you look at the transcluded page for the second case The Life of Mary Baker G. Eddy/Chapter 02#33, you can see that an image on another page has been inserted between paragraphs, rather than breaking up the paragraph that continues from the end of the page to past the image. I've done that myself, as on this page: Wonder Tales from Tibet/The Clever Prince and the Stupid Brother#6. I don't know how accepted the technique is, but it makes sense to me. I don't see anything unusual about the first example you show. Mudbringer (talk) 19:16, 31 July 2017 (UTC)
This syntax is known as "labelled section transclusion" and it's used when you want to transclude part of a page instead of the whole thing. This feature is documented at Help:Transclusion#How to transclude a portion of a page. —Beleg Tâl (talk) 19:45, 31 July 2017 (UTC)
This is good information to have. From now on, I can accept this as correct. Maile66 (talk) 21:12, 31 July 2017 (UTC)
yeah, placement of images can be tricky, and i would argue for some flexibility in shifting from a strict printed location to where is flows better in the completed work. for example, in A Woman of the Century, there were errata images, which i inserted in the corresponding place. [48] Slowking4SvG's revenge 12:04, 2 August 2017 (UTC)
For some works I have simply used <noinclude> to display the image in the Page: ns; and <includeonly> in the main ns. Did it that way as the image placement in the work put the image right out of context, which was ridiculous in how we had the book typset. You just need a reasonable reason and to annotate with an html comment in circumstances where an explanation is useful for proofreaders. — billinghurst sDrewth 13:57, 2 August 2017 (UTC)

Proposal to allow "fair use" in certain limited scenarios

There have been a few discussions lately about "fair use" on enWS. I think there is one specific scenario in which "fair use" should be acceptable: if a work is released under an acceptable license, but contains some non-free text (or other media) under "fair use" (or with explicit permission of the copyright holder), we should be able to include that text or other media as part of the entire work that has been released freely.

Rationales:

  1. It is not always possible to determine that a selection from a free text is actually a non-free citation included under "fair use".
  2. If an author can release a work under a free license even though it contains "fair use" selections, we should be able to host it even though it contains "fair use" selections.

Example: Green Eggs and Ham is the usual example of a nonfree work that has been published under a free license by a third party under "fair use", as it was included in the congressional record after someone read it out loud in congress. While it would be unacceptable to host Green Eggs and Ham as a work on its own, we could (possibly) host the congressional records under a free license, and my proposal above would simply suggest that we don't need to censor the section that quotes the nonfree work.

Example: The Book of Common Prayer (ECUSA) almost certainly contains translations of religious texts that are non-free. Can you identify these passages?

Anyway, this is just something I was thinking of that might be acceptable, and so I though I'd bring it up. @Slowking4: this discussion will interest you. I think your idea of what "fair use" should be acceptable is broader than what I suggested above, but this discussion or a sub-discussion could be the place for that as well. —Beleg Tâl (talk) 13:43, 18 July 2017 (UTC)

@Beleg Tâl: You may wish to explain how this is different from, or if in fact it is, simply being a textual equivalent of c:Commons:De minimis. Mahir256 (talk) 18:52, 21 July 2017 (UTC)
@Mahir256: This proposal is different from de minimis. De minimis is usually so trivial as to cause no violation of copyright law. It is basically an uploader's defense, no site policy required. This proposal is more in the line of Exemption Doctrine Policy (1, 2). EDP is applicable to identifiable non-free content within a free content, but an EDP rationale is mandatory within the license tag. Hrishikes (talk) 03:55, 22 July 2017 (UTC)
I am willing to explore the issue where the copyright of an included work is vague or unknown, I am not comfortable with reproducing a work known to be within copyright, and the example provided pushes me straight away. The publishing of the congressional record should not be a reason for us to reproduce the work "Green Eggs and Ham". Our reproducing of the parent is not enhanced with GE&H, and the CR would be as worthy with that component redacted. — billinghurst sDrewth 09:09, 22 July 2017 (UTC)
I'm a little unsure about that example as well. The Congressional Record can be a bit of a mess, with a lot of random stuff read into it, but GE&H is easy to locate and remove and not particularly relevant.
Let me place an example on the table. The NTSB report for the crash of Korean Air Flight 801 on Guam. Page 6 is labeled "Instrument approach chart for the Guam International Airport runway 6L ILS procedure. Reproduced with the permission of Jeppesen Sanderson, Inc. NOT TO BE USED FOR NAVIGATION." Pages 16 & 17 are diagrams of the captain's and first officer's instrumentation panels, courtesy of Boeing. Page 35 is another instrument approach chart from Jeppesen, compared against the one on page 6 on the next page. Page 106 is a half-page graph, courtesy of Boeing, et la. That's the complete list of marked non-free works in a 212 page report. Having to leave those out would certainly discourage me from working on it. Is this something we want to support, that we feel we can claim as fair use?--Prosfilaes (talk) 16:13, 22 July 2017 (UTC)
I think that nonfree text that is included in a free text with permission, as in your example, should definitely be hostable. That's not really "fair use" though is it? It's more like a license from the cited text's author to allow that portion of the text to be published in the containing work under its license. —Beleg Tâl (talk) 16:53, 22 July 2017 (UTC)
the "instrument approach chart" is typically copyrighted, and so is a case for the fair use proposal. large blocks of quoted text (more than de minimus) may be used by others without permission under a fair use rubric, and so another case to adopt the proposal. (i.e. in general no one gives permission to Congress to license their testimony under a PD-gov). we could adopt an EDP for "texts that are part of the public record, not commercially available" to avoid the "full green eggs and ham" straw man.
see also Wikisource:Scriptorium/Archives/2014-05#Dealing_with_non-free_images_in_transcriptions_of_freely_licensed_works Wikisource:Scriptorium/Archives/2016-10#Exemption_Doctrine_Policy_.28EDP.29, and Wikisource:Copyright policy Slowking4SvG's revenge 13:55, 23 July 2017 (UTC)
Actually, I have an example that may be relevant to this case, though yes, I wouldn’t call it fair use either. Have a look at Internet Health Report v.0.1. The entire work is released under CC BY-SA 3.0 but there is a caveat in the license text that says "excluding portions of content attributed to third parties." So I guess the text can be included here as long as we included these third party attributions as it appears in the text, couldn’t it? Ciridae (talk) 03:42, 24 July 2017 (UTC)
The issue about fair use is infringing on the rights of authors to direct their copyright, so if it is a snippet or lower quality (per enWP) component of a work included in another and that is how it is published, then to me that would seem more reasonable. If it is a complete work, or a high quality reproduction, or not meeting the author's intent, then I do not think that it is okay. Remembering that we allow people to take our works and sell them as long as they maintain our licensing requirements. Writing that as a policy statement is problematic, and is always going to be needing adjudication, and that will suck IMNSHO. — billinghurst sDrewth 04:41, 24 July 2017 (UTC)
in the "UN Internet Health Report" example on page 4, you find a cover of the economist magazine. this is clearly fair use (surprised it has not been deleted already). this is another example that the EDP images proposal would allow here. we very well could host texts for scholarly reuse, but choose not to out of concern for the profits of others. it is a disagreement about the mission.
do you want to limit the image size? the images of page scans are notoriously small size, and not a replacement for the original. - i am not a big fan of the english image size reduction. i see a stream of downsizing from 60 kbytes to 20 kbytes, it is a distinction without a difference, and it clouds the image provenance.
"meeting the author's intent" when a snippet gets quoted / pasted, that is a transformative reuse, different from the original author’s intent - also we had a presentation at wikiconUSA from people who specifically make parody works, with a legal department - they have an unbeaten record in federal court defending fair use. (author’s intent or parody has not been at issue in the proposals, rather they are about including government / open access documents with copyrighted snippets) Slowking4SvG's revenge 10:19, 24 July 2017 (UTC)
To sum up the discussion so far:
  • Commons:De minimis is already acceptable
  • Nonfree text included in a free work with author permission is also acceptable (example "Instrument approach chart"), and I'll comment further that this is essentially equivalent to the author releasing the quoted text under the including work's free license; if the including work's license contains a caveat for the included work then this may not be acceptable.
  • The original question regarding fair use in a free work (i.e. with no author permission) has no consensus, with the following perspectives:
    • All quoted works that would constitute "fair use" in the containing work under US law should be hostable (my proposal)
    • Quoted works known to be copyrighted should be removed; works with vague or unknown copyright might be hostable (billinghurst)
    • Quoted works that are copyrighted and commercially available should be removed; other copyrighted quoted works might be hostable (suggested by Slowking4)
    • Complete works and high-quality reproductions should be removed; snippets and lower quality components might be hostable (billinghurst)
The above is just to keep things organized. —Beleg Tâl (talk) 14:44, 1 August 2017 (UTC)
I would suggest that, if the proposal were to pass, we could use a license tag to handle rationale, something like this:

 

This work is is freely licensed or in the public domain, but contains non-free content. This is okay because:

  1. WS operates under US law, and this content is considered fair use under US law
  2. WS's EDP allows fair use content under certain conditions, see below
  3. WMF allows fair use content under certain conditions, see below

etc

Public domainPublic domainfalsefalse

Or something along those lines. —Beleg Tâl (talk) 15:09, 1 August 2017 (UTC)
good summary - i would quibble that "Reproduced with the permission of Jeppesen Sanderson" ≠≠ "releasing the quoted text under the including work's free license"; rather it is what it says: permission to reproduce in the context of the document, and a credit. unknown derivative license. i.e. [49]
i would be happy with any of these versions of an EDP. Slowking4SvG's revenge 22:12, 2 August 2017 (UTC)

I'm opposed to this. The focus seems to be too much on removing obstacles to hosting materials that we as editors would like to host, and not enough on our mission and our consumers. I think we are better off as a site that hosts public domain material, period. Hesperian 01:09, 3 August 2017 (UTC)

@Hesperian: I think that both focuses lead to the same end result. I want to remove obstacles to hosting public domain material that is within our mission and for the benefit of our consumers. I point again to my examples above, i.e. the Congressional Record and the BCP. Both are public domain material, both are valuable to our consumers and both are within our mission. However, our current policy requires us to censor such works because they contain material that is included as "fair use". WS operates under US copyright law, and under US copyright law it is perfectly acceptable to put a text in the public domain even if it contains material that is "fair use". I want to remote obstacles that prevent WS from hosting these public domain texts. Also: I think that it will help our users and editors: they can trust that if the work is in the public domain, that they can host it here without any further problems. —Beleg Tâl (talk) 18:57, 4 August 2017 (UTC)
So what it comes down to in these unusual cases, is do we
  1. Provide our readers with a complete but encumbered test; or
  2. Provide our readers with a incomplete text that, by by virtue of its incompleteness, qualities as a free cultural work — one that our readers are allowed not only to read, but also change, improve, incorporate, copy, distribute, even commercialize.
In my opinion, option 2 is more in line with our mission.
Hesperian 03:16, 5 August 2017 (UTC)
If you are giving me one of two choices, then I too will favour 2). I still feel that I fall into something that scores a 1.8, however, if it is binary, you have my opinion. — billinghurst sDrewth 05:17, 5 August 2017 (UTC)
it is a blinkered, diminished vision. misstatement of option 1 - i.e, it is "free text with encumbered illustrations, or encumbered block quotes" (that user could redact). all you are doing is moving traffic to institutional transcription sites, where they have control over the works, not commons. Slowking4SvG's revenge 00:36, 7 August 2017 (UTC)
To move the discussion forward, I'm prepared to grant that one of us has a "blinkered, diminished vision". Also, I'm okay with moving traffic to other sites if that traffic is people looking for non-free material. Hesperian 01:30, 7 August 2017 (UTC)
I'd be happy with something intermediate such as 1.8 if it can be put into practice. I understand the desire to have our works be completely free in all their parts, as Hesperian mentioned, but again I question the feasibility of it. Using again the example of Book of Common Prayer (ECUSA), which is in the public domain in the USA, I challenge any editor to distinguish with any precision the parts which are original or pre-1923 from the parts that are fair use but copyrighted or UK-URAA. We can actually use this as a case study, if we like: what actions are we as a community willing to do in order to preserve this work in our collection? Option 1 "complete but encumbered" would be just to keep the whole thing, noting that some parts are (probably) copyrighted and included under fair use; option 2 "free cultural work" would be to research every part individually and censor as necessary, a massive undertaking - or to give up and delete the whole thing outright; any approach between the two could be a precedent for a policy on how to handle "fair use". —Beleg Tâl (talk) 12:10, 8 August 2017 (UTC)
I have works like that, like the Principia Discordia, but I'm not really comfortable with a work where we think there's significant copyrighted material and we can't clearly identify what is and what isn't clearly public domain.--Prosfilaes (talk) 07:34, 10 August 2017 (UTC)
I notice that the fair use images from Principia Discordia have been deleted from Commons, but hosting them locally along with the second license tag on that page is exactly what I am proposing to allow. —Beleg Tâl (talk) 14:38, 10 August 2017 (UTC)
I   Support this limited fair use in generally copyright-okay works like commons:Commons:De minimis.--Jusjih (talk) 20:33, 22 August 2017 (UTC)

In researching this further, I think I have shifted my position to Hesperian's point of view. Firstly, the policies at WP and Commons pointed out that in the US, "fair use" is more of a defence in court against accusations of infringement, rather than an exemption from copyright at the time of publication. Second, it's true that it isn't in the spirit of w:free content, especially since anyone can take any section of a free content work and do what they like with it, which is not the case if they take a free-use section from an otherwise free content work. —Beleg Tâl (talk) 14:50, 14 September 2017 (UTC)