Be Watchful of Text FidelityEdit

There was a recent debate about the textual fidelity of the JPS 1917 Bible. As a result I created a template called {{fidelity}} for use when a text's source may have fidelity issues. It automatically ads such a text to the Category:Doubtful Fidelity. I'm sure, because of the fact that texts get added from such a diverse number of sources, this isn't going to be the last time that the fidelity of a text's source will be in doubt. I suggest making this template (or one much like it) WS policy for all such cases, and regular checking of that category part of the "daily grind". See WS:COPYVIO#JPS_1917 for the actual dispute. —Wikijeff 04:05, 1 July 2007 (UTC)Reply[reply]

Good idea; probably should be put in Category:Text integrity and maybe renamed to Category:Doubtful fidelity. --Spangineerwp (háblame) 06:44, 2 July 2007 (UTC)Reply[reply]
The less words in a template, the easier to recall it offhand. "verify", "fact", "fidelity" "incomplete", "notability", these are all easily remembered...whereas it's difficult to recall whetherto use "incomplete text", "text incomplete", "work incomplete" or "incomplete work". I think sticking with "fidelity" would be best. Sherurcij Collaboration of the Week:this week: Ernest Hemingway 04:50, 13 July 2007 (UTC)Reply[reply]


Not only of the sort being done on RLS's "Dr. Jekyll / Mr. Hyde", but possibly something like a modern "rendition" of, say, the Ballad of Gresham College. EG. the 9th paragraph would be:

The selfsame glass did likewise clear
Another secret more profound:
That naught but air into the ear
Can be the medium of sound,
For in the glass emptied of air
A striking watch you cannot hear.

Not much, I admit, but for works written before standardization of spelling this could be as usefull as a bunch of < ref >'s 20:56, 2 July 2007 (UTC)Reply[reply]


For example, Charles Darwin's Origin of Species has 6 versions which all contain slight modifications from the original. There should not be duplicates of the information which is the same, but rather a new page which uses the same database source as the original which can be modified and tracked much like the wiki version control does.

Texts Directly from OCR SourcesEdit

A lot of Wikisource texts are taken directly from existing texts posted to the Internet. This seems to constitute the majority of texts available here. But a small number of texts are additions that come directly from OCR-ed page images. In the cases where I have uploaded direct OCR output, I've always uploaded page scans to Wikimedia Commons and linked back it from an uploaded work's Table of Contents; but I'm not sure other people always have. I created a template {{pageimages}} for to identify works that are the direct product of OCR, and where to find the page images to make proofing possible. Perhaps this practice should become a mandatory policy for texts produced directly from OCR output. —Wikijeff 06:07, 9 July 2007 (UTC)Reply[reply]

you should have a look at the proofreadpage extension. ThomasV 07:54, 9 July 2007 (UTC)Reply[reply]
I read proofreadpage extension without understanding it very well. As you upload scanned book images to Wikimedia Commons, I am thinking of uploading PDF images of the United Nations Resolutions to Commons as well. The UN Administrative Instruction ST/AI/189/Add.9/Rev.2 releases Official Records including compilations of resolutions into the public domain, so I would like to upload the relevant PDF images to Commons. However, Template:PD-UN should probably be split based on which UN departments made works as I predict a rapid growth of these works.--Jusjih 18:51, 18 July 2007 (UTC)Reply[reply]

Central Location for Help with BooksEdit

It would be nice if there where a central location dedicated to requesting help with books. That is, let us say I have a somewhat rare book that is missing a page, and I am unable to find it at my local library or through ILL (Inter-Library Loan): it would be nice to have a place dedicated to posting requests like "I am Missing Page 4, 5, and 6 of XYZ work, can someone help me?". —Wikijeff 06:38, 9 July 2007 (UTC)Reply[reply]

I think this would be a good subsection of Wikisource:Requested texts. -SCEhardT 20:03, 9 July 2007 (UTC)Reply[reply]


We actually have a page dedicated to that purpose at Wikisource:Requests for assistance. It's not used often, and probably isn't well presented, but that would be the best place to make such a request (and I think that people have made requests such as you are suggesting).—Zhaladshar (Talk) 12:36, 20 July 2007 (UTC)Reply[reply]

Users Letting Others Know the Libraries to Which They Have AccessEdit

It would be even better if Users made known what libraries they have access to! To use a real life example: I have scans of a rare work in German (and some Hebrew I think) missing two pages and I know that just about the only place in the USA with a copy (microfilm) is one of Harvard's libraries. But I don't have access to Harvard or any of its libraries and (I don't think) I can get this on ILL. I'd live to add it to (German) Wikisource, (and have an English translation made here) but I can't because my scans of the work are incomplete. I'd love to ask someone for help, but I have no idea who here has access to Harvard's library. Further not everyone reads the Community Discussion section, so even if I did ask their what are the chances a Harvard student (or Alumni) will see it? Not good. But if Users listed libraries to which they had access (perhaps by a list page, or maybe a userbox) I might actually be able to get images of two missing pages! I could approach those who are most likely to be able to get the missing material themselves, in addition to making a general request. This would go a long way to help those who add rare works to this wiki! —Wikijeff 06:38, 9 July 2007 (UTC)Reply[reply]

even better would be if users uploaded their scans on commons or ws, instead of simply making known where they have found them. if you upload your scans of a work, even if you do not have all of them, then someone else can fill the gaps ThomasV 07:52, 9 July 2007 (UTC)Reply[reply]

Author namespace and searchEdit

I doubt that the average new user, when presented with the search bar, would expect to see Charles Dickens, the book, when he enters "Charles Dickens". Or that Jane Austen doesn't exist. Would it be appropriate to create cross-namespace redirects for cases such as Austen, and disambiguation links at the top of pages like Charles Dickens?

Search in my mind is Wikisource's greatest problem. Sorting via categories and index pages helps, but the lack of decent, intuitive advanced search significantly reduces usability by the novice. Creating redirects helps, but it's only a band-aid, and many more are needed. --Spangineerwp (háblame) 06:40, 12 July 2007 (UTC)Reply[reply]

I agree that we have a terrible search problem here. It's been proposed in the past to have a check box on the search bar to search for an author instead of a work. But the developers said such functionality would be available when this elusive "namespace manager" finally gets rolled out. That was over a year ago now. I think we need to try to get some developer to realize that Wikisource needs a more robust search feature and to make the necessary changes to get that to happen.—Zhaladshar (Talk) 12:40, 20 July 2007 (UTC)Reply[reply]

Abolish author namespaceEdit

I agree; this is why I opposed the creation of an Author: namespace ; we do not have one on The German and Spanish wikisources neither have one. (I think the German used to have a prefix, then dropped the idea).
Note that the Author: prefix in the title does not only impair the search on ws, but also the Google pagerank of author pages, except if the user types the prefix in Google, which is very unlikely.
ThomasV 08:10, 12 July 2007 (UTC)Reply[reply]
The search engine can be adjusted to search author pages by default, in which case searching "Charles Dickens" will return Author:Charles Dickens as the first or second result. The namespace also makes it possible to specifically search authors with the upgraded search engine, using the simple syntax "Author: Name".
I don't think the prefix affects Google's PageRank; a Google search for "Charles Dickens" actually returns the author page before the work of the same title. —{admin} Pathoschild 17:43:59, 12 July 2007 (UTC)
Well, if the default namespace for search was set to be the "author" namespace, then it would no longer be the main one. I guess nobody wants that, because then search would be restricted to the author nammespace. So I guess your proposal to adjust it is not serious, is it ? please try to make constructive remarks.
Second, your Google query is restricted to en.wikisource, so it's not very difficult to get the author page ranked first. However my point was more general (and more important) : it is about getting Wikisource pages returned by general Google queries, in order to increase the project's visibility. If you read the documentation you will learn that page title affects Google's Pagerank, whether you think it does or not.
ThomasV 09:25, 13 July 2007 (UTC)Reply[reply]
My remarks were constructive, as you would have realized had you assumed good faith or, at least, read the $wgNamespacesToBeSearchedDefault documentation I linked to above. It is entirely possible to search both work and author namespaces by default.
The Google search I cited showed that the "Author:" prefix did not seem to disadvantage the title as compare to the same title without the prefix, since it was actually sorted first. Restricting the search to Wikisource did not affect sort order; without the restriction, searching "Charles Dickens" returns 'Author:Charles Dickens' as #105 and 'Charles Dickens' as #650.
The search results you cited suggest that the author prefix actually improves sort order (since the "Author" prefix will boost results when the search includes the keyword 'author'), and not noticeably affect sort order otherwise. Another of your search results confirms that additional data in the page title improves sort order. —{admin} Pathoschild 05:09:31, 14 July 2007 (UTC)
sorry about namespace search, you are right, I misinterpreted this variable.
concerning search engines however, I do disagree. Page title does affect search ranking, as you finally admitted. Now I guess you will also admit that "charles dickens" is more likely than "author charles dickens" as a query string. in that case, the prefix will affect sort order, and it will do so noticeably. This can be checked by comparing ranking between wikisource pages. For example, the author page for "charles dickens" gets a higher ranking on the French Wikisource than on the English one, even though the English one is much more complete, and gets more internal links. This affects general (unrestricted queries). For example, if I try a general query for charles dickens, then I see the French wikisource author page in the fourth page of results returned by Google. The English one is on the 14th page. Another example : For Friedrich Nietzsche the English author page is very poorly ranked.
this is quite logical: if you boost a page's rank for a particular keyword, you can only do so at the expense of other pages. therefore adding more keywords in the title decreases a page's rank for general queries.
ThomasV 07:53, 14 July 2007 (UTC)Reply[reply]
I don't think that is the case. The reason the French Wikisource gets a higher priority in your searches is that you're using Google French ( In my own language-neutral search, the sort order is as such:
Filtered to Wikisource domains, the sort order is:
As you can see, there are over a hundred pages sorted before either Wikisource that include extra keywords in the title, and in all language-neutral searches both English pages are sorted before the French page. Further, the author page is sorted before the work page in all language-neutral searches. While searching for extra keywords will naturally sort pages whose titles or pages include those keywords first, the reverse does not seem to be true: searching only the author name will not reduce the sort order for pages with extra keywords.
When no extra keywords are used, Google seems to apply its PageRank algorithm, which bases sort order on incoming links:
PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important".
{admin} Pathoschild 17:03:16, 16 July 2007 (UTC)
ok you are right again. I did not know about the "hl" option. sorry for the confusion. ThomasV 08:42, 17 July 2007 (UTC)Reply[reply]

Another reason why the Author namespace might be ended is that it doesn't go with the general Wikisource link in Wikipedia. If you want to find out more about anybody who was not an author the description: "Wikisource has documents by or about" doesn't work at all since subject bibliographies are discouraged, are they not?
If I want to find out more about Thomas Jefferson, only the fact that he is an author allows me to find works about him, and even then I feel apologetic for including information about him there since the page is clearly introduced as an "Author" page.
Right now the only way to apply a link about a person who doesn't write is to originate a whole category about them, and which can't be organized like a bibliography might helpfully do: descriptions of the works have to be separated as a list from the list of links to the works themselves (or duplicated), and the list of links have to retain an alphabetical order.
Here is another problem along these lines in non-human subjects: In the article Lives (Dryden translation)/Theseus for instance, there are about seventy-five links to wikipedia. But you can't create a subject "Ancient Greece" (or whatever the name of the Wikipedia article is) to allow wikipedians to be guided by great works like Plutarch's Theseus to learn about the subject--they have to already know ahead of time that there is a great work by an author called Plutarch describing ancient Greece and go to that Wikipedia article, and then to the author page and his work.
If you want to know about other such works, you can go to a category:Ancient Greek writers, or luckily, in this case, the Ancient writers list, but you get no works by more modern authors. If you create a category you experience the same problems as with trying to produce a single reference to works about the people who aren't authors.
(I should also mention that you would have to have to change the category description page each time you add a new work belonging to that subject category, while with a Subject page you would avoid redundancy by doing just one link and a description [rather than having to do two links and a description and having both links show up in nearly the same place]. Secondly, you get no indication whether a subject is mentioned very briefly in a work, or is the main subject, which goes against the philosophy that works should contain a few broad categories. Having a subject page would free you to be exhaustive without adding nearly pointless categories to works.) 05:37, 16 July 2007 (UTC)Reply[reply]
To link to a work on Wikisource, {{wikisource}} says "original text related to this article"; to link to an author page, {{wikisource author}} says "original works written by or about". The search feature allows you to find pages about an author or particular subject ({{wikisource}} links to the search feature by default), and Category:Works by subject also helps for broader subjects. However, Wikisource is a collection of source texts, not an encyclopedic appendix; the primary purpose is the works itself, not compiling bibliographical lists for Wikipedia.
While subject index pages work well in theory, past attempts on Wikisource have failed miserably. They are routinely out-of-date, incomplete, often inaccurate, and poorly organized, while giving the impression that they are comprehensive indexes. For example, see a recent deletion discussion for an index of fiction; despite being far broader than what you describe, it was virtually abandoned. Index pages for specific subjects would be even more difficult to keep updated.
A solution for authors is a section like "Articles about this author", such as on Author:William Cullen Bryant. For non-authors, a list in the relevant "External links" section on Wikipedia would be more appropriate (see featured article example), since Wikipedia is an encyclopedia that should contain bibliographical references. —{admin} Pathoschild 17:32:18, 16 July 2007 (UTC)

New Template for Texts with Sections Subject to Scholarly DisputeEdit

For scholarly and critical works it would be nice to have a template to mark sections of the text which are subject to scholarly dispute. Not wikisource editor dispute regarding source purity, etc, but a dispute that arises from reconstruction from multiple differing sources. For example, the Epistle of Barnabas 1:6 includes text marked by the dagger character '†', which in textual criticism circles means that portion of the text is in dispute by scholars. It would be nice to have a template in which disputed text could be marked (perhaps rendered slightly gray), and automatically enclosed by the '†' character. I don't know if Labeled Section Transclusion would be of any help for such sections, but if it would, the template should use that too. —Wikijeff 01:55, 30 July 2007 (UTC)Reply[reply]