Page:Wikipedia and Academic Libraries.djvu/295

This page has been validated.
282
Gavin Willshaw

transcription errors using Wikisource and reupload those transcriptions back into the Library’s repository, which improves the search function of its digital collection. e project can be deemed a success: it has been estimated that at its peak there were more National Library of Scotland staff working on the platform than all other Wikisource editors combined; the project is also thought, anecdotally, to have been the largest ever sta contribution to a Wikimedia project. As of November 27, 2020, 1,064 of the 3,000 Scottish chapbooks in the collection had been fully transcluded, with an additional 535 books fully proofread (“Wikiproject NLS Progress”).

However, despite all the effort in overcoming the challenges outlined above, the actual number of books fully transcribed has been quite low, and far lower than had been anticipated when the project started in March 2020. By the time the Library reopened in late July 2020, approximately 16,000 Scottish chapbook pages had been fully transcribed. Considering that approximately seventy staff had contributed to the project over a twenty-week period, the actual number of pages transcribed per person was only around ten per week. Based on this progress, it seems fair to conclude that if an organization’s sole reason for engaging with Wikisource is to improve the quality of transcriptions from their digitized books, rather than using Wikisource they would probably be better building their own OCR correction module or buying one of the various commercial transcription packages or services that exist. The different stages of the Wikisource workflow take a lot of time: each Scottish chapbook was worked on by at least five different people as it progressed from upload to transcription export to the Library’s gallery. Added to this, there are several manual elements to the process that are time-consuming, such as generating indexes on Wikisource by copying each individual URL from Wikimedia Commons, and frustrating, such as changing the OCR software from the default Tesseract engine to the far superior Google engine for every page. What is more, by adding books to Wikisource, there is an associated responsibility to manage the book once it is on the platform, to interact with and be guided by the existing community and to adhere as far as possible to their standards.