Page:From documents to datasets - A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks.pdf/5

This page has been proofread, but needs to be validated.
From documents to datasets: A MediaWiki-based method of annotating and extracting...
239

A platform for field notebook access and annotation: Wikisource

We quickly realized we needed a way to support the annotation of species occurrences on an open platform so that anyone interested could help with the task. We decided on the Wikipedia-related project Wikisource (http://wikisource.org) for the following reasons:

Ease of use. The process of uploading scanned pages is simple. PDFs are uploaded to the Wikimedia Commons and pulled into Wikisource. Once in Wikisource, hyperlinked index pages can be created and transcribed text can be matched with the scanned image of each field book page (Figure 1). The wiki markup language is similarly easy to learn and use. The language is the same as that used in Wikipedia, which means skills developed in Wikipedia can be brought to Wikisource easily.

Completely open access. Everything on Wikisource can be edited by anyone, giving us a way to crowdsource annotation to citizen scientists and archivists. All Wikisource pages have a built-in means of tracking edits that ensure that all changes made to the transcriptions are documented and reversible.

An existing community of developers. Wikisource uses the same software as Wikipedia (a PHP application named “MediaWiki”), which is under active development by a core team of developers. Sharing the same software and licensing terms means that content can be shared between the two projects freely. Additionally, pages designed to be incorporated into other pages (known as templates in Wikispeak; see http://en.wikipedia.org/wiki/

Figure 1. Web browser view of a scanned page of Henderson’s journal displayed side-by-side with transcriptions and annotations using the MediaWiki Proofread Page extension.