Wikisource talk:Proofread of the Month

Active discussions
archived suggestions
Shortcut:
WT:PotM


Please help start a list of text that need to be proofread. Larger text are preferred because we hope to have a large group of people working on the text of the month. Here is a great place to start looking for text to be proofread.

List of suggested works not actionedEdit


LinksEdit


Short works requiring validationEdit

Have problematic pages
Translations, not eligible for simple listing

New works of less than 30 pages to be added to QUEUED

*

it:Wikisource:Rilettura del mese/Testi brevi

A list of potential PotM candidatesEdit

On the transcription project, there is a good list of text that are ready to be proofread. That list is available here. This list continues to grow so it would be great if we could knock it down. --Mattwj2002 11:03, 5 August 2008 (UTC)

My person opinion, If people keep bringing in projects (and I have seen it) then they should do a good part of the editing. Some, whoever they are, bring in works for others to do and the work-load adds up. Too, if the texts are brought in and left for others, then others may not like the topic so the work load keeps building up. It would be nice to know [who] likes what to work on. *I* like history and specifically illustrated history* but not children’s books or poems. I have several more volumes to do and more I want to do after that. This way I work on what I brought in, or have another bring in because he/they like the same kind of work. --Maury (talk) 01:33, 1 December 2016 (UTC)


Calendar 2021Edit

ListEdit

Month Work Category Status
January Quirky
February Fine arts
March Wikipedia:WikiWomen's History Month / Woman author
April Poetry / Drama
May Geography
June Fiction: Novel
July Anthropology, Mythology, or Religion
August Biography
September Science/Technology
October Fiction (SF/Fantastical/etc.)
November Language
December Fiction: Short story collections

January 2021 (Quirky)Edit

Are we going to line some new things up, or are we going to default back to The Placenta of a Lemur? BD2412 T 19:56, 27 December 2020 (UTC)

No objections: it is now indexed at Index:Quarterly Journal of Microscopical Science - New Series - Volume 61.pdf, starting at page 171.
Note, the plates are tightly bound, so we probably can't reconstruct the gutter from this scan: n540, n541, n544, n545, etc. Inductiveloadtalk/contribs 21:43, 27 December 2020 (UTC)
What would possess them to draw across the fold like that? Didn't they know someday someone would need to scan this on a futuristic piece of machinery? BD2412 T 22:24, 27 December 2020 (UTC)
Particularly frustrating as QJMS has really nice and unusual illustrations. Inductiveloadtalk/contribs 22:54, 27 December 2020 (UTC)
  • I have some suggestions, similar to the ones I produced last year; some can be replaced, if desired. They are as follows:
    • “Appeal to the Wealthy of the Land,” 44 pp. (transcription project)
    • “Attainder of Treason and Confiscation of the Property of Rebels,” 44 pp. (transcription project)
    • Loves Garland, 118 pp. (external scan)
    • “Muscles and Regions of the Neck,” 28 pp. (external scan)
    • “Notes Upon the Owners of the ‘Sancy’ Diamond,” 44 pp. (external scan)
    • “Notices of Roman Inscriptions Discovered in Northumberland,” 60 pp. (external scan)
    • “On the Character, Properties, and Uses of Eucalyptus globulus and Other Species of Eucalyptus,” 18 pp. (external scan)
    • “A Physical and Topographical Sketch of the Mississippi Territory, Lower Louisiana, and a Part of West Florida,” 42 pp. (external scan)
    • A Prospect of Manchester, 101 pp. (external scan)
    • “Remarks on the British Quarantine Laws,” 44 pp. (transcription project)
  • I have not thoroughly checked over these, so that should be done before any one is selected. TE(æ)A,ea. (talk) 22:57, 28 December 2020 (UTC).
    • Whatever else we do, we should definitely include “Remarks on the British Quarantine Laws” - that's timely. BD2412 T 18:55, 31 December 2020 (UTC)
We'll need a djvu version rather than pdf. The PotM templates assume djvu file extensions. Beeswaxcandle (talk) 23:00, 31 December 2020 (UTC)
With a million or so IA PDFs now at Commons, might be worth fixing the template instead? DJVU is great and all, but sometimes it's just a hurdle when the PDF exists at Commons and is readable and has OCR? That said, we might get paragraph breaks in the DJVU OCR soon, and AFAIK, PDF doesn't do that). Inductiveloadtalk/contribs 16:06, 1 January 2021 (UTC)
  • I have checked all of the works I listed (excepting the “Remarks,” as it was already created); they are intact, although the Google information page will need to be stripped from A Prospect. TE(æ)A,ea. (talk) 16:53, 1 January 2021 (UTC).
  • Work should start on these works as soon as possible; they should thus be added to the appropriate templates as soon as possible. Inductiveload, can you upload a DJVU of this file? TE(æ)A,ea. (talk) 02:29, 2 January 2021 (UTC).
  • @TE(æ)A,ea., @Inductiveload, @Beeswaxcandle, @BD2412: The first work for the month has been fully proofread; have the others been uploaded to Commons yet, and when do we start on those? DraconicDark (talk) 17:19, 14 January 2021 (UTC)

February (Fine arts)Edit

March (Woman's history)Edit

Proposed: the 1925 Anita Loos novel Gentlemen Prefer Blondes has just entered the public domain. BD2412 T 00:05, 2 January 2021 (UTC)

Index:The part taken by women in American history.djvu
Index:History of Woman Suffrage Volume 1.djvu
Index:History of Woman Suffrage Volume 2.djvu
Index:History of Woman Suffrage Volume 3.djvu
Index:History of Woman Suffrage Volume 4.djvu
Index:History of Woman Suffrage Volume 5.djvu
Index:History of Woman Suffrage Volume 6.djvu
Index:Women of the West.djvu --Slowking4Rama's revenge 23:14, 12 February 2021 (UTC)

  • It's decision time. BD2412 T 02:07, 1 March 2021 (UTC)
  • Gentlemen Prefer Blondes is fully proofread, and largely validated, with the exception of images. Once that is finished, I would recommend Women of the West as the next work. TE(æ)A,ea. (talk) 17:59, 7 March 2021 (UTC).
    • Maybe, finishing The part taken by women in American history.djvu and then something non-USA focused? I like the approach of using this series to complete and validate texts rather than starting from scratch which can be harder for inexperienced volunteers. Languageseeker (talk) 20:19, 7 March 2021 (UTC)
      • That work is mostly proofread anyway, but the index is not proofread; that will take some time to complete. I think choosing another work, perhaps a work of a famous woman author, would be better. TE(æ)A,ea. (talk) 14:03, 8 March 2021 (UTC).
        • I picked that one precisely because it’s so close to being done. It’s a huge work and finishing it during this month would be a great way to celebrate women in herstory. Languageseeker (talk) 18:30, 8 March 2021 (UTC)
  • The proofread/validation for Gentlemen Prefer Blondes is done, and since we still have 2/3 of the month left, I   Support Women of the West, second choice The part taken by women in American history. 3rd choice something not on this list (those History of Women Suffrage volumes seem too unwieldy for a mid-month replacement). Clay (talk) 15:44, 9 March 2021 (UTC)

April (Poetry / Drama)Edit

May (Geography)Edit

June (Fiction: Novel)Edit

  • The Great Gatsby in the news [7], [8], gutenberg here [9]; web1.0 [10] editions here: [11], [12], [13], [14], [15], [16], [17] Slowking4Rama's revenge 01:36, 1 January 2021 (UTC)
    greatgatsby0000unse_c4w8 uploaded at Index:The Great Gatsby - Fitzgerald - 1925.djvu (the only one available from the IA not in the "borrowing library"). And it doesn't have a copyrighted intro. Inductiveloadtalk/contribs 16:06, 1 January 2021 (UTC)
    • I have moved this discussion to June, as that month features novels; I certainly think that this work should be proofread. TE(æ)A,ea. (talk) 16:39, 1 January 2021 (UTC).
    • I see that Phe-bot is creating the pages, but there are many, many errors in punctuation, and even in the text included on the page. Especially words at the start or end of a page are incorrect, or missing, or are from a neighboring page. --EncycloPetey (talk) 17:07, 1 January 2021 (UTC)
      • That's common for match and split (it can get confused by hyphenations and sometimes loses track slightly), it's why the pages are created "red" rather than "yellow". Inductiveloadtalk/contribs 17:09, 1 January 2021 (UTC)
        • But that only happens when the text is being imported from a separate source instead of using the text layer in the source file. That's why match-and-split is done from a proofread copy, not for a newly added work. For a newly added work, the text layer of the file should be used, becuase editors of different editions make different editorial changes. --EncycloPetey (talk) 17:35, 1 January 2021 (UTC)
          • It's coming from the PG copy, which is almost right except for some very minor differences. It's definitely closer to what's on the page than the raw OCR. It needs checking in any case. Inductiveloadtalk/contribs 17:41, 1 January 2021 (UTC)
            • But those are the sort of differences that WS proofreaders don't expect. For example, you've proofread page 11, but there is still some punctuation missing that is present in the scan. OCR text generally isn't missing punctuation, so we're not in the habit of carefully checking it. Since the text comes from an external source and has gone through PG editorializing, we need to pay careful attention to the punctuation beyond what we normally do. --EncycloPetey (talk) 17:48, 1 January 2021 (UTC)
              • it will be done by June, so don't bother putting it in the queue. Slowking4Rama's revenge 23:08, 1 January 2021 (UTC)
              • And EncycloPetey's prophecy came true: look at the punctuation errors on page 11 that got past the validator. [18] BethNaught (talk) 23:20, 1 January 2021 (UTC)
                • gutenberg copy editing is different, (checking punctuation, small caps, italics, dashes, and page breaks) and we should be welcoming to the newbies (or forgetful outsiders) who are attracted to the "in the news" works. Slowking4Rama's revenge 13:44, 2 January 2021 (UTC)
                  • Phe-bot is not run by newbies. The issue isn't about welcoming newbies, it's about setting them up for failure with a flawed text from the get-go. If we want to have a Gutenberg edition, then it should be added as a Gutenberg edition, not entered under the misleading guise of the text from a scan. --EncycloPetey (talk) 18:24, 4 January 2021 (UTC)
  • Since there are so many great novels from 1925, I was thinking about making June a month to celebrate the major novels that entered the PD in 1925. I've created several list here and I'm working at creating Indexes for all of them. It'll be a great way to prepare for some summer beach reading. Languageseeker (talk) 02:07, 23 March 2021 (UTC)

July (Anthropology, Mythology, or Religion)Edit

August (Biography)Edit

September (Science/Technology)Edit

  • International Library of Technology volume 1B: Lathe Work; Planer Work; Shaper and Slotter Work; Drilling and Boring; Milling Machines (transcription project)
  • In absence of a “natural sciences” category, and with some opposition to the above work (on account of its length), I nominate The Conchologist's First Book (external scan), a major work in the development of conchology as a popular discipline, especially in the U.S. (where it was first published). The scan above given is the first edition. This work is by far the author’s best-selling work, and contains many images besides, which make the work more appealing. TE(æ)A,ea. (talk) 02:15, 4 June 2021 (UTC)

October (Fiction: SF/Fantastical)Edit

  • I nominate [21] Hoffman's Strange Stories, a collection of English translations of the popular stories by E. T. A. Hoffmann, who is best remembered for The Nutcracker and the Mouse King, which was adapted into the well-known ballet. Several of his other stories have inspired well-known composers to write music and operas, but thus far The Nutcracker... is the only one of his stories for which we have a complete text. --EncycloPetey (talk) 22:27, 16 January 2021 (UTC)

November (Language)Edit

  • We seem to have no works about the French language. Does someone have a recommendation? Perhaps a seminal work on the subject? --EncycloPetey (talk) 23:19, 2 February 2021 (UTC)

December (Fiction: Short story collections)Edit

Better Current Collaborations SectionEdit

I don't think that the Current Collaborations section on the front page is serving us to well. Instead, I propose that we have a running list that automatically progresses to the next work once the current one is done. Furthermore, I propose that we divide the texts into four categories and display all four at once to allow users more choice and cater to more skill levels.

  1. Easy - These texts will be proofread texts that need validation. They will serve to introduce users to wikicode through an immersive environement and provide a low barrier to entry.
  2. Medium - These texts will require proofreading, but have fairly decent OCR. They can be novel or books imported from PGDP.
  3. Hard - These books have more complex layouts and may have more garbeled OCR, but they should not present too great of a challenge. Perhaps a book with lots of images or a few tables or a pre-19th century works with long s and ligatures.
  4. Challenge - These are probably mainly reference books or manuscripts. Lots of complex formating required.

In this way, users can select from the range of difficulty and have a choice of which book they wish to proofread. Languageseeker (talk) 01:09, 10 March 2021 (UTC)

I do agree that we need some method of automatically queueing up the next POTM when the current one is finished. BD2412 T 06:16, 23 March 2021 (UTC)
PotMs can now be queued up in Module:PotM/data and will auto-advance at the end of the month, updating both {{PotM}} and {{Collaboration/POTM}} automatically. Currently, April is set to be Index:The torrent and The night before.djvu, but that was a random choice from the proposals since there has been no other discussion there. You can also pre-queue further works in a month, but if prior works aren't done, you need to set current, otherwise it'll point to the last work in that month's list by default. Inductiveloadtalk/contribs 07:33, 23 March 2021 (UTC)
@Inductiveload: Thank you for this. This looks awesome. Do you think it would be possible to add navigation arrows so that users can choose to look at the other works and potentially work on them? Also, would it be possible to add a description section so that we could add one or two sentences about the book? Languageseeker (talk) 01:39, 24 March 2021 (UTC)
Re the arrows, that would be kind of awkward in the Wikitext paradigm without some kind of JS to dynamically load them in. Which would need to be globally loaded (and maintained into perpetuity), so it's a bit impractical for a single-purpose thing. If you wanted something more fancy and web-2.0'y, you should probably consider writing a Toolforge tool that serves a web app of some sort.
Descriptions would also be possible, technically speaking, to add to {{PotM/base}}. In theory you could show a more detailed list at Wikisource:Proofread of the Month than a "normal" invocation of {{PotM}}. Inductiveloadtalk/contribs 18:20, 24 March 2021 (UTC)
@Inductiveload: The arrow thing sounds like way more work than it's worth. What a simple textual Previous Next? Is that as much work? If it is, then it's also not worth it.
How would I go about adding a Description to {{PotM/base}}? Languageseeker (talk) 18:47, 24 March 2021 (UTC)
@Languageseeker: Previous/next would require a page to use as a target. If the text doesn't exist as Wikisource page you can reach with a [[Link]], you'd need a script to do it. {{New texts}} has archives, but PotM's method has been to just add works to the gallery at Wikisource:Proofread_of_the_Month. In theory, if the PotM data was backfilled, that gallery can be auto-generated.
To add a description to {{PotM/base}} is a matter of editing the template and figuring out a nice place to put it, if the parameter is given (I guess just before {{#switch:{{{option}}}...). Since this is core template, that's actually a protected template, so you'll have to do it in {{PotM/base/sandbox}}. Then you need to get Module:PotM to pass descriptions to the template (near line 117) if present in the data item. Inductiveloadtalk/contribs 19:21, 24 March 2021 (UTC)

Core textsEdit

I think the entire idea of a work a month does not work. Not everyone wants to proofread the same text at once. I think that we should have a list of several works every month including half finished text such as the one you mentioned. Progress would be measured by the number of pages proofread. Then, we could build a core set of key texts and clear the backlog. However, if we maintain our present system than this work would be out of scope for this month. Languageseeker (talk) 19:17, 20 April 2021 (UTC)

@Languageseeker: pardon a change, the text you mentioned has been removed. I've changed the header to reflect your proposal, please modify that if so inclined. CYGNIS INSIGNIS 13:05, 23 April 2021 (UTC)
There is no requirement to work on the PotM, and there are hundreds and hundreds of texts with Index pages awaiting proofreading. Editors are free to work on what they wish from that very large selection. Various editors here have compiled personal lists of what they think ought to be worked on, and everyone's list is different. --EncycloPetey (talk) 16:50, 23 April 2021 (UTC)
@EncycloPetey: I agree that everyone has a different lists, both there are some works that commonly make it to top lists and that should have scan backed editions. These are the texts that will attract users. Right now, we don’t have scan backed copies of Hamlet, Portrait of Dorian Gray, or Clarissa, to name a few. Could text such as these not be considered core texts? Languageseeker (talk) 17:13, 23 April 2021 (UTC)
Yes and no. Fiction is just one area that Wikisource covers; there is also non-fiction to be considered. Further, Hamlet is a play (a long one), and when we have previously tried community proofreading of a play it did not go well. Dramatic works require an advanced knowledge of formatting that is not conducive to community efforts. Wikisource also does works that have been translated into English. We to not have a copy of The Mahabharat nor do we have scan-backed copies of Rumi's Masnavi I Ma'navi or Ovid's Ars Amatoria. We have almost no translations of major works in French, Spanish, and Italian. There is so much missing, that it would be an enormous list were we to compile it. It might be worth assembling such a list, but selecting good scans for these works would require care, dedication, and expert knowledge or research in some cases. And many of these works are beyond the scope of a single month's transcription. --EncycloPetey (talk) 17:37, 23 April 2021 (UTC)
I think a WikiProject Core Texts would be a really very good idea, but it needs a lot of curation and meta-editing to get it usable. Dumping a list of Wikipedia's 100 greatest books and letting it rot won't achieve anything. For example, Wikisource:Requested texts is pretty moribund, because it's way way easier to drive by and say "Oi, we should have X, Y and Z, hop to it" than arrange decent scans and actually do it. Even when scans are (eventually) sourced, requesting users very rarely actually do it. Furthermore, I speculate that one reason we have such poor "core" coverage is because most "core" texts are already trivially available online and people would rather do something new than painstakingly patch (or re-patch) a PG text into a scan-backed format.
A hypothetical WP:CT would presumably take on some of the responsibility of finding, processing, uploading and generally organising the "core texts" to facilitate proofreading by others, especially (but not limited to) newcomers.
In my internal fan-fiction of how that might go, it would be partnered with an analogue to the frWS "Mission 7500" (formerly Défi 5000), where we have some kind of automated back-end that allows use to track progress, which (at least at frWS) appears to be good rallying point for many people. Again, this is something that takes real effort to organise and maintain, it doesn't just happen. I'm unconvinced that the frWS bot-edited page method is the most efficient way (I'd think some kind of web-app on the Toolforge combined with some kind of on-wiki list/JSON/categorisation would be more snazzy, responsive and need less maintenance), but it certainly seems to work well for them.
I have deliberately glossed over the issue of determining what is and isn't "core".
Tl;dr it's a lot of work and needs concerted effort over a fairly long term. Inductiveloadtalk/contribs 17:55, 23 April 2021 (UTC)
Re: most "core" texts are already trivially available online and people would rather do something new: In some cases, it's possible to do both. For example, when transcribing the dramatic works of Aeschylus and Sophocles, I deliberately went for editions by other translators. That is, nearly every on-line copy of Sophocles' plays in English comes from the translations by Storr that were published in the Loeb Classics, which went up on one site and then got copied over and over to other sites. And nearly every library copy in the US is Campbell's translation. So I opted to transcribe the editions of Jebb and Plumptre, which were not the same copies everyone else offered. Wikisource now has the plays of Sophocles, but not the same copies every other site has. Doing this won't always be possible in every case, but it is something to consider. Similarly, most translations of Russian novels online are the translations by Constance Garnett, which are generally held to be mediocre, so Wikisource could strive to find public domain copies by other translators. --EncycloPetey (talk) 18:10, 23 April 2021 (UTC)
@EncycloPetey, @Inductiveload: I definitely don’t want to create a dumping ground for important texts that nobody works on. I also hear that what is important varies and many of these texts can be found available online. However, the goal is to attract users to Wikisource. Even if a text is available online, it still makes sense to create a scan backed copy here. These are the texts that will bring users. Also, many of these texts are of dubious quality so having a better quality copy makes sense. Lastly, by creating a high-quality etext, this site can help to make these texts to a broader community and end systemic inequality.
The selection of the texts will still have a communal nature, but will now explicitly aim to build up the core collection. We can do multiple translations or editions as long as they make sense.
I’m also not proposing to turn this into a series of dead, white men who wrote fiction. Instead, I think it makes sense to split the works into categories: LGBTQ+, Black Authors, Classical Text, Translations, Women Writers, Scientific Texts, etc.
I think that we should at least try the French model. I’m happy to do the translations and create the templates, if an administrator is willing to import the bot. For May, we can select 15 texts. 8 to proofread and 7 to validate. It could be a good test to see how this community reacts. Languageseeker (talk) 18:46, 23 April 2021 (UTC)
not supported From my several years' experience of being the primary co-ordinator of PotM, large projects do not work. There needs to be one work at a time on the Mainpage. The broad domains of knowledge that are in the annual calendar were set up to ensure that there is a good range of works worked on to appeal to various interests. Over the years we've interpreted those domains with some latitude. Also, I find the concept of "core texts" to be paternalistic at best and dictatorial at worst.

"Oh, you haven't read XXX? Well, that's absolutely dreadful. Of course, it's so much better in the original language, but you must still read it before you can consider yourself to be a well-rounded member of society."

Yes, we want to make available the works that are widely considered to be the best ever written, but we must also make the also-rans available. If we don't, then we are not following our neutrality stance. Beeswaxcandle (talk) 19:35, 23 April 2021 (UTC)
@Languageseeker: I think the most useful thing you could do up-front while we ask for the bot to be approved and set up for us, is to detail exactly how Mission 7500 works at frWS and construct a draft page for enWS. Importantly, a list of monthly task that are needed to keep the wheels turning, as that is probably where it's all seize up à la PotM, so we need a figure out a way to minimise that list. Likely we'd also evolve some different "rules of engagement". And it needs a name: Mission 7500 is unlikely to work since I doubt we can do 7500 page right off the bat! Inductiveloadtalk/contribs 19:16, 23 April 2021 (UTC)
@Inductiveload: I created an English translation at User:Languageseeker/POTM and imported all the relevant templates. Most of the heavy work appears to be done by the bot that generates the statistics. The two main tasks that will require human intervention is 1) Selection of Texts 2) Cutting and pasting the individual texts as they get worked on. Languageseeker (talk) 20:14, 23 April 2021 (UTC)
@Languageseeker: to be clear, I don't think we should really be aiming to replace PotM with this. To me (and I know there are many interpretations) PotM is more a way to get an "interesting" and accessible work, rather than an "important" one, out for collaboration.
This seems more like an parallel thing. Perhaps after some time it'll become a primary driver of progress, or perhaps it will fizzle out, but I think either way we should let it be its own thing than an attempt to replace the existing collaborations. Inductiveloadtalk/contribs 22:08, 23 April 2021 (UTC)
@Inductiveload: I think it’s important to feature it on the front page even if we don’t call it PoTM. Maybe, for now, we can replace the Current Collaboration with this? Or, do you have a different term in mind? Above all, I want to give it a fair shot to see if it will work. Languageseeker (talk) 23:23, 23 April 2021 (UTC)

Change May to Harlem RenaissanceEdit

I think that it would great if Wikisource expanded its collection of writing by persons of color. For next month, there are no good suggestions. Instead, I propose to change next month to The Harlem Renaissance so that Wikisource will include this major literary movement and achievement of Africain Americans. For the first text, I propose The New Negro. Languageseeker (talk) 13:30, 22 April 2021 (UTC)

This feels like a much bigger project than would last a month. If a collection of works with scans and OCR layers can be accumulated, it might make a good Community Collaboration in a few months' time. --EncycloPetey (talk) 21:16, 22 April 2021 (UTC)
A Community Collaboration would be the best approach to this idea. The current Collaboration is stale and was only put back in to the templates to tide us over until a new Collaboration could be decided upon. PotM needs to be a smaller project. In the past when we've tried a larger project we've had very little traction. Beeswaxcandle (talk) 18:55, 23 April 2021 (UTC)

Propose work by ChestertonEdit

Before I start on this, I thought something like this might be an enjoyable work for this sub-project: Index:What I saw in America.djvu by Author:G. K. Chesterton. If it wont slot in to a theme in the next year or so, then let me know and I will keep it to meself. CYGNIS INSIGNIS 14:27, 14 May 2021 (UTC)

@Cygnis insignis: I would be happy to feature it as part of the Monthly Challenge for next month if you would like. Languageseeker (talk) 16:43, 14 May 2021 (UTC)
I'm not aware of what that is, but anyone is welcome to improve the index. CYGNIS INSIGNIS 16:52, 14 May 2021 (UTC)
@Cygnis insignis: see Wikisource:Monthly Challenge. Nominations at here. Inductiveloadtalk/contribs 17:58, 14 May 2021 (UTC)
Return to the project page "Proofread of the Month".