Wikisource talk:Monthly Challenge/Archives/2021-05

Latest comment: 2 years ago by Inductiveload in topic Planning for June
Warning Please do not post any new comments on this page.
This is a discussion archive first created in , although the comments contained were likely posted before and after this date.
See current discussion or the archives index.

Nomination themes

I think we should have a handful of semi-formal "reserved slots" (so there are always at least n—probably 1—works of a type at the start of each month) for certain categories of works on top of the existing "tags", focused more on WS-internal themes and prodding a process along, however slowly. Some ideas for such a category:

Obviously we should try to find works that we think are somewhat useful and likely to be worked on. Inductiveloadtalk/contribs 15:46, 4 May 2021 (UTC)

Absolutely agree with you on themes. I would like to reserve a slot for a women writer, a black writer, a non-western writer, a major author, a major text that entered the public domain, a pre-nineteenth century work, and a major periodical. I would also like to centralize all the various places that we send new users to one. The nomination should replace requested texts. Languageseeker (talk) 05:09, 8 May 2021 (UTC)
I'm not sure about reserving a slot for black writers specifically, since that's rather a US-centric view of ethnic representation, IMO. Perhaps it makes more sense to divide into "Western minority" and "non-Western". The realities of history and copyright make nearly all eligible works of the former category those of black writers, because most other minorities in the West often retained their original national identities enough to be classified as "non-Western" authors. However, it doesn't seem fair to me to exclude, say, British-Chinese authors, since they're also a minority. For example w:Sui Sin Far is a British-American author of (part) Chinese background who has nothing at WS. There aren't actually many authors like her who are Western nationals but minority ethnicity and lived long enough ago to have eligible works, so it seems only fair to include her and people like her in a dedicated slot? Inductiveloadtalk/contribs 00:26, 9 May 2021 (UTC)
Yes, minority author is a much better way of phrasing it. However, I also don't want to lump all minorities into bin that reduces the total number of slots available to them. I certainly would not like to pit one minority against another. To avoid this, do you think it would make sense to think in terms of percentages? This is always such an important, but delicate conversation that I get lost.
I would also like to give voice to indigenous voices.
Also, I have no problem with reserving some slots from the requested text category.
Above all, I want the Monthly Challenge to represent a diverse category of works and not just dead-white men. Languageseeker (talk) 00:46, 9 May 2021 (UTC)
I don't think "percentages" is a good idea. Percentages of what? In what country? In what era: 1926, 1800, 1700? I think just have the slot and address perceived imbalances though the nomination process. While the slot may be "reserved", it still needs to be filled by a nomination. Also there is certainly nothing stopping us having more works than reserved slots, especially if a work covers more than one reserved slot. I would say indigenous authors probably can be rolled into "non-Western" (taking Western as a cultural thing rather than geographic), if only because I'm not sure there will be enough PD works of indigenous authors. But we can still take it into account if one gets nominated.
The whole process should IMO be flexible. The slots idea is just to encourage diversity in nominations plus get some eyes on a few maintenance facets. What actually happens depends on what gets nominated, not to mention how many works get completed to make space: at the current rates, we'll have no space for new works in June anyway! Inductiveloadtalk/contribs 12:13, 9 May 2021 (UTC)

US copyright status of Mathnawi vol. II

As this was published in the UK in 1926 and the author died in 1945 is this PD-nonrenewal or is it still copyrighted in the US under URAA until January? MarkLSteadman (talk) 23:27, 10 May 2021 (UTC)

@MarkLSteadman: I fear you are right. Referred to Wikisource:Copyright_discussions#Index:The_Mesnevī_(Volume_2).pdf. Inductiveloadtalk/contribs 18:07, 14 May 2021 (UTC)
@MarkLSteadman: Regrettably, I think that this is probably the most prudent course. We can feature it next year as part of Celebrating the Public Domain. Languageseeker (talk) 20:00, 14 May 2021 (UTC)

Completion and rolling over of volumes of which only part is in the MC

In MC 2021-05, we have at least two works which are backed by scans that include lots of other matter: The Time Machine and Dorian Grey. Do we consider those works "done" for the purposes of the MC when the relevant sections are complete?

Obviously it would be nice to get the whole volume done, but, certainly for Lippincott's 46, that's a lot of material (800 dense pages) that are probably not going to get done, since they're not particularly "special". So it might make sense to allow them to be shunted off once they're validated?

If so, I might also add a field to the data table to allow us to forcibly mark an index "proofread" or "validated", even if not all pages are complete. Inductiveloadtalk/contribs 18:14, 14 May 2021 (UTC)

@Inductiveload: I've been thinking about this quite a bit. For periodical, I think it makes more sense to proofread them in parts. While it would be great to have the entire Lippincott's 46, much of that volume will attract little attention. Proofreading in bits will also make it easier to create scan-backed editions of major novels that were serialized in periodical form.
BTW, for Dorian Grey, the author is not being set in the epub export.
For next month, I'm thinking about replacing "Dorian Gray" with "The Sign of the Four" from Lippincourt 45. Agree?
For "The Time Machine", the entire Volume 1 should be proofread because all of it is a unique edition of H.G. Wells that worth keeping.
In general, I'm wondering if it makes that much sense to keep the texts in the MC after they've been proofread. Yes, validation is great, but it takes away from proofreading. Would we like to have one text that is 99.9% accurate or two texts that are 98% accurate? Scan-backing means that a text can be corrected at any point in the future and that might be a better approach to going through the Oldies backlog.
What would you think of this proposal
  1. Remove all proofread and fully transcluded texts at the end of the Month.
  2. For periodical, each Monthly Challenge must make clear which section(s) of the periodical should be proofread. Once those sections have been completed, the periodical will be removed from the MC at the end of the month. If the MC text is a serialization, the next part of serial will be advanced as soon as the previous part is completed even if this occurs before the end of the month. Languageseeker (talk) 19:57, 14 May 2021 (UTC)
@Languageseeker: I think it is perfectly fine to have bits of a periodical in an MC. For example, you can have a single work like Dorian or you could have a just a single issue of a volume at a time. Some periodicals like Lippincott's are so big I really don't think it's likely we'd ever get a whole volumes to proofread, let alone validated. Definitely make which bits are in the MC clear.
I'm neutral on adding works mid-month. Bear in mind that it will cause some stats to be adjusted (for example anything using total pages).
RE removing after proofreading, I dunno really. I guess it depends on how much validation we see vs. proofreading, which will take some time to stabilise to a steady state signal. Some people prefer to validate, others to proofread, so I don't know if keeping works around to validate will take too much away from the proofreading. If we scrap them after one month, we'll basically never see a validated work. I don't personally mind (I'm a 196%er myself) but it is part of Wikisource that some people really value.
Maybe just allow them to idle in the "to be validated" section, and if no-one does them (in general, I predict they won't, but I could be wrong), they'll expire naturally. We could move the "to validate" section(s) to the of the page? Remember that they'll get their own "one month old" section.
I'll have to think about the export author thing when exporting a section of a larger work.
Another edition of The Sign of the Four is already proofread from scans, so is there something we can do that would be a totally new thing? Out of all the bazillions of literary journals, there must be tons of stuff that's missing (or not scan-backed)? Inductiveloadtalk/contribs 20:22, 14 May 2021 (UTC)
@Inductiveload: I think that moving to To Validate Texts to the bottom is a good idea.
Most of the popular parts of monthly magazines were reprinted that is why we know about them. However, texts often changed between the magazine version and the published version. Proofreading the original magazine version creates a new way of experiencing an already familiar text.
There is no online edition of the Lippincourt text so that would attract fans of Sherlock Holmes to us. It’s only 100 pages so it’s not that much to proofread. I see it as a short and cool text to do quickly. Languageseeker (talk) 21:41, 14 May 2021 (UTC)
@Languageseeker: the exports now take the section_author field value if set. So Dorian takes "Oscar Wilde". If you were to export all of Lippincott's 46, it would not have an author set.
Fine, then add it to the nominations. I don't have a better idea for now. I don't disagree that we should have all the relevant versions, it's just I think we'd be better off having at least one version of everything than 2 versions of half as many things, especially when we know we have way less than half the things to start with. Again, whatever people want to work on.
...so that would attract fans of Sherlock Holmes to us. well, it would if we had, like, any messaging outside enWS :-/ Maybe we need a Tw*tterbot (insert your own vowel).
Inductiveloadtalk/contribs 16:08, 15 May 2021 (UTC)
@Inductiveload: Maybe, I'll save Holmes for another times. We have too many works already.
A twitter bot would be great for Wikisource in general. The more promotion the more users we will attract.
Also, could you add a Validation = true in the data so that we can move works such as Dorian to the To Validate section. Languageseeker (talk) 01:00, 19 May 2021 (UTC)
@Languageseeker:   Done - set status = 'proofread' to override the automatic proofread detection. Inductiveloadtalk/contribs 07:30, 19 May 2021 (UTC)

Duplicated titles

In the works listing, I see that Heart of Darkness is under the category Novel, while this current sprint at the time of writing is Novels. This comment might be in the wrong place, however I just want to note that that's there, and that's probably an accident. Testingitro (talk) 00:35, 22 May 2021 (UTC)

Shakespeare First Folio

Is there a particular reason why someone has started another index page for this when there is already a version Index:Shakespeare - First Folio Faithfully Reproduced, Methuen, 1910.djvu that is approximately 50% progressed? Perhaps it's for the same reason that incomplete scans of Paradise Lost keep being uploaded. Chrisguise (talk) 00:30, 29 May 2021 (UTC)

@Chrisguise: Yes, there were a number reasons for doing so with these being the principle ones:
  1. The Methuen is a valuable and important text, but it does not come from a single folio, but a combination of folios. Therefore, it's a new edition of the first folio. The differences are probably minor, but I wanted there to be a digital edition of the First Folio that can be matched to an exact physical one.
  2. The scans for West 190 are full resolution and not a compressed DJVU which makes them easier to read. Furthermore, the Methuen text is a printed engraving of a photograph making it more difficult to read than the original. This also means that the Methuen will have lower quality woodprints than the original source.
  3. The Methuen is transcribed with certain modernizations of orthography such as the replacement of the long s with a regular s. I wanted to preserve the original orthography.
Languageseeker (talk) 00:21, 31 May 2021 (UTC)

Planning for June

@Inductiveload: For June, I think that we should have the following changes.

  1. Make the root page Wikisource:Community collaboration/Monthly Challenge/Current Challenge.
  2. Retire Mathnawí early due to potential copyright challenges.
  3. Retire The Atlantic Monthly (Volume 1), The Strand Magazine (Volume 1), Nature (Volume 1) early because these serials are probably better dealt on an individual article basis.
  4. Would it be possible to create an award each month for user that proofread/validated the First, Second, Third highest number of pages?

Thoughts? Languageseeker (talk) 01:44, 27 May 2021 (UTC)

@Languageseeker: I'm not sure if the root page is a good idea to be the current month. It makes sense for getting people to the current works ASAP, but it hides the rest of the stuff. Maybe a nice big "current month" link? The bold "Monthly Challenge" link on the front page goes to the active month as it is.
I'll have a look at a way to "retire" works early. It'll have to go in the data table. We also need to have a way to record "completions" in the data table. Probably this should be manual, since it'll be fragile and awkward to work this out "online" all the time (in particular it's vulnerable to breakage down the line when the MC is long-finished and people mess with the index).
User awards are certainly possible, since the information is recorded in the DB. But, not all users appreciate being entered into league tables by default. I'm pretty sure there was once some kind of collaboration before where there was an opt-out (or maybe opt-in, and may not have been enWS) list that was used by quite a few people. Inductiveloadtalk/contribs 09:38, 28 May 2021 (UTC)
@Inductiveload: Can you set up the page for June? You know the infrastructure better than I do and I don't want to make a mess that requires untangling. I'm happy to add the volumes, but don't want to damage things.
For retiring texts early, maybe it's possible to just add a no display field so that the text can still be in the data table, but no longer visible on the page?
Good point about privacy. I always that opt-in is better than opt-out. It's also best to make the stats as anonymous as possible, e.g. Languageseeker_total = X not Languageseeker_total_for_work_y = x. Languageseeker (talk) 00:17, 31 May 2021 (UTC)
@Languageseeker: Pages now set up. The indexes need to be added to the category for the script to pick them up. Probably need automating one day.
I have retired Mathnawí (via the last_month parameter in the data table). For the other three, do you we have replacements? Or will First Folio be the only immortal work?
We also need to add new works for June to the data table, which I will do, but you should check and make adjustments if you don't agree.
We now have ~10 hours to break 3000 pages! Inductiveloadtalk/contribs 13:51, 31 May 2021 (UTC)
Great work! I tagged the new works for June. My plan is just to retire the periodicals and not replace them with anything. Maybe, we can get rid of the no expiry section and just blend Shakespeare in with the rest. It also might make sense to get rid of the two less than 50 pages section. Here’s to breaking 3,000. Languageseeker (talk) 14:07, 31 May 2021 (UTC)
For Shakespeare, that could work if we target one play at a time within the index (like we did for Dorian Gray within Lippincott's 45). In that case, we probably should keep it as no-expiry, because it'll take aaaages to chew into it. But the issue then is that the old months will show the current month's targeted section instead of what they were at the time.
At this rate it might be simpler to curate the data tables by month and manually copy rolled-over items, which means it's easier to make customisations without affecting past data tables. Not really an enourmous amount more work that currently, and actually might simplify modules. Inductiveloadtalk/contribs 14:17, 31 May 2021 (UTC)
Actually, targeting a specific play might not be a bad idea and might get more done. 20-30 pages is certainly less intimidating than 900+ and should be doable in 3 months. How about retiring the First Folio and replacing it with The Tempest (First Folio)? Languageseeker (talk) 14:23, 31 May 2021 (UTC)
@Inductiveload: Also, since the FF is an image based index. Shouldn't it be possible to create a index just for a play that will also updated the index for the entire play. This could make it even less intimidating. Languageseeker (talk) 14:37, 31 May 2021 (UTC)
@Languageseeker: I have changed to month-by-month data tables, so the June data is now at Module:Monthly Challenge/data/2021-06. This makes it a lot easier to have very fine grained control over what's in a month's challenge without having to special-case lots of indexes. I still need to fiddle with things like sprints, but the main listings are using the monthly data already.
For FF, I see where you're coming from, but I'm not sure breaking our "index is an edition" convention is a great idea. What we can do is split up the page list like Index:Ferrier's Works Volume 3 "Philosophical Remains" (1883 ed.).djvu. That breaks the interactive page lister but the page list is done so that's not really an issue for me. Inductiveloadtalk/contribs 15:10, 31 May 2021 (UTC)