2024 Wikisource film transcription tutorial

There are a few different mediums which may present the option for transcribing components of a film into text, primarily dialogue, text used to tell the story or setting, and text which appears in scene. This may allow for easier translation into foreign languages and subtitles, improved searchability and wikilinking.

Film transcription has its own WikiProject on Wikisource, WikiProject Film, a community with a focus on improving Wikisource's coverage of film and works related to film. Lots of resources on film are available in the subpages as well, such as several lists of public-domain films that are currently available online awaiting transcription at Wikisource.

Because of the extremely tedious process of normally processing film transcriptions (due primarily to current Wikisource technical limitations), it is highly recommended that you place film transcriptions in Wikisource:WikiProject Film/Drafts. There you will be able to proofread a film on one page only, and once done the film will be processed in the Index, Page, and Main namespaces automatically by a bot run by PseudoSkull. There is also a sandbox specifically meant for tests related to film transcription in the Index namespace (which should also probably be used to test technical improvements), located at Index:Sandbox.webm.

Key

Key (info)
Dialogue
In scene
Storyline

Dialogue

Refers to spoken word appearing in the media. Pronunciation of dialogue tend towards spelling if it is in the vernacular.

For spoken dialogue, the following is at the discretion of the editors:

  • Spelling of terms with other common alternative forms, for instance in other English dialects, may be spelled in any way that is reasonably common, at the discretion of the editors. Whatever dialect of English is used should be used consistently throughout the transcription, though.
  • Filler words, or words that otherwise have very little meaning and give very little to the transcript, may be omitted in certain cases (or may not be). Omission is at the discretion of the editors. Terms such as these include gasp, uh, um, sigh, ahh (screaming), etc. Note that this does not apply to words such as so, well, you know, or alright, as these words do carry more semantic meaning and are more obviously said.
  • Background speech that is very hard to hear, and is not meant to be heard by the audience, should be transcribed if possible, but where it is not possible to hear it should be left out of the transcription. No pages should be marked as problematic due to failure to make out background speech.

In scene

Text which appears during the film as part of the scenery. This can be used to transcribe text which may be helpful in searching for or are integral to the storyline or characters. An example would be in Alice in Wonderland where Alice opens a basket which says "Eat me," or in Daydreams, where the fictional character "Dr. Richard M. Scott's" appears on sign.

Scenery text which appears more than once in a film should not get repeated transcription, especially if it appears multiple times in the same scene, in most cases due to redundancy. It should generally be transcribed only when it initially appears in full and is readable.

Scenery text which is important or meaningful to the plot of the film should be transcribed, but unimportant scenery text can be left out or included at editor discretion. Scenery text which is deemed unimportant should not be marked as problematic if there are problems reading it. Irrelevant in-scene text is treated similarly to advertisements on Wikisource.

In some cases, such as in Safety Last! (1923) which is set in a big city and therefore features a very large amount of sign text which is irrelevant to the plot, it is probably better to leave a lot of the in-scene text out, as it would just clutter up the transcription, thus greatly obscuring more important dialogue and storyline elements.

Storyline

Text which is added in as part of the story, which may include silent film's w:intertitles, credits, or locations and dates.

Timeframes

In selecting a portion of time in which to capture a transcribable item one may use:

  1. a moment of time (i.e. [00:12]) for:
    1. Text appearing as part of the story line.
    2. Text found within the scenery.
  2. a window of time (i.e. [00:15-00:29]) for:
    1. Dialogue
      1. In selecting which portion of dialogue to include within a time frame, it may work to partition by scene. If the scene has a lot of dialogue, it may be better to split it up so there is less to verify per portion.
    2. Scrolling text as either part of the storyline or scenery.
    3. Usage of this is at the editors' discretion. Some editors will want to include every item of dialogue as separate elements, but including both a beginning and ending timestamp is excessive. The ending timestamps are easy enough for readers to find out on their own, or even sometimes guess based on word length, that it is not crucial.

Templates

  • {{film}} - used in notes section of header to typically show .ogv and .ogg files at 400px, although size can be modified using size= parameter. Also includes a key for transcription type. Adds into Category:Film
{{Film
 |example.ogv
 |thumbtime=02
 |size=600px
 }}

Film trailers

Film trailers may be included at Wikisource, but as a subpage of the film in question (see for example Little Annie Rooney (1925 film) and Little Annie Rooney (1925 film)/Trailer).

If the film trailer survives and the film itself does not, it can be included as its own page (unless the film is one day found), as has been done at The Great Gatsby (1926 film trailer). This should also be done in the case of films where the trailers are freely licensed but the films themselves are not, such as in Citizen Kane (film trailer). It should be marked in the title in that case as "(film trailer)" and not "(film)".

When searching for film trailers to upload to Commons or here, make sure they are official film trailers. Many parties will make fan-made trailers to classic films, so be careful. Not only are such trailers unoriginal and shouldn't be included here on that basis alone, but they are also probably copyrighted, or at least copyrightable, as fanmade trailers tend to have been made in recent decades. Furthermore, including a film trailer you made yourself is unacceptable.

Unoriginal content

Content that is unoriginal to the film but is on the available file anyway should not be transcribed. Unoriginal content can include titles/endings giving credit to the preserver of the film (such as the ones placed by the Library of Congress), unoriginal watermarks, unoriginal timers, etc. Such content should probably be removed from our file itself where possible, since we want to portray the film as it originally looked.

Further modifications

Currently, when listing a segment of time as opposed to an instant, ie [00:07-00:49], there is a potential overlap of text. In order to avoid this, place

<div style="margin-left:{{{1|2}}}em; margin-right:{{{1|2}}}em;"> 

before listing {{Page}} transclusions. Hopefully this is a temporary solution.

Future

  • Selected time used in index to govern the beginning and end time within its "Page:" by use of {{Temporal Media Fragment}}.
  • <pagelist/> to use time frames specified in index for translcusion into mainspace using {{Page}}.

See also