Thanks for taking an interest in this and turning it into a project.

In setting up the project you may want to take into account the supplements. Until now only two articles from the the first supplement have fallen within the alphabetical range in which I was working. These were designated with the disambiguator (DNB01), since the first supplement was published in that year. The second supplement is also in the public domain. The third and fourth are at least in the public domain in the U.K. Eclecticology 09:05, 17 June 2008 (UTC)

Thanks for your question. I've been having arguments on this for quite some time, and feel that the issue has never properly been considered. I tend to consider consistency with past practice as a less than convincing argument.
It all comes down to how we define a work. Some lend themselves quite well to a "Title/Chapter" format; whole novels fit that model very well. Other books are really collections of independent works. Short stories fall into that. 100 years ago many of these short stories were first published in magazines, before being collected in a book. When they were collected they were often subject to considerable editorial change which we, as scholars of their future, should be taking into account.
Magazines from that time were also collections of independent works. The short story of a still popular author would often be followed by an essay from a deservedly forgotten author. This doesn't change the fact that they were independent works. Listing these as "Magazine/Article" results in the article, which is what is really important, being lost in the absence of a more sophisticated system for finding the articles. Someone is diligently working to keep the author pages categorized, but that is not where the real need lies.
Encyclopedic articles (including geographical and biographical directories, and dictionaries) are closer to the edge. Some, like the DNB, are the product of multiple authors, but others may be entirely the product of a single author. Alphabetical order will often be the only thing that connects two successive articles. To be the most important feature of these articles is their content, not the fact that they happen to inhabit the same encyclopedia.
I have also been trying to take into account the fact that articles about the same subject in different encyclopedias have a closer connection than two dissimilar articles in the same encyclopedia.
Although I have used expressions like "(DNB00)" and "(EB11)" as disambiguators, I am not myself convinced that it is necessarily the best approach. I only find that the "Encyclopedia/Article" format is much less helpful. Eclecticology 18:59, 19 June 2008 (UTC)

Question around ToC (Moved to Wikisource talk:WikiProject DNB) -Arch dude 14:26, 26 August 2008 (UTC)


Being new at the wikisource DNB project, could you review things I've done & let me know if I'm barking up any wrong trees? Dsp13 (talk) 01:10, 30 October 2008 (UTC)

Yes, my preference would be the same as yours: mixing editions around seems bad practice to me if one is providing a version of a decades-old text, & liable to lead to several sorts of confusion later. Dsp13 (talk) 22:38, 1 November 2008 (UTC)

Task ordering for the 'bot?

Hey Arch dude,

I hope this answers your questions in regards to the bot work I have been doing. The bot does the pages in numerical order (1, 2, 3, 4, etc). I have to tell the bot what text to bot. The reason I skipped a bunch of the text is because the DNB text were mostly uploaded incorrectly. They are suppose to have the Google page removed from them. That is something I want to work on when I have sometime. It has been hard with the holidays.

As far as doing the actual bot work it only takes a few hours per text. Something to keep in mind too is you have to click the pages link to see what pages have text on them. This doesn't auto refresh.

It won't really mess anything up if you do a page ahead of time, but if you do, I'll have to tell the bot where or not to copy over the page or not. It'll be a little more of a manual process for me if you do.

Please leave any additional questions, comments or ideas on my talk page. I hope this answers your questions.

Kind regards,

Mattwj2002 (talk) 10:38, 29 November 2008 (UTC)

Arch dude in a few hours I believe the first 8 volumes will be done (I am botting right now). If you can continue to work on these volumes and uploading the correct version of the text I would really appreciate it. I'll continue to bot as time permits. --Mattwj2002 (talk) 11:11, 29 November 2008 (UTC)
Arch dude we need to take off the Google page off of the text. Don't worry about it. The uploader will fix it or I will. Soon you'll have 8 Volumes to proofread! :) I am sure by the time the first 8 volumes are all proofread, we'll have all of the text botted up. Feel free to drop me a message anytime. I love to get messages! :) --Mattwj2002 (talk) 11:49, 29 November 2008 (UTC)

DNB Poor Quality Scans

Arch dude, please add your comments to here.

Arch dude, I answered your question here. I hope to hear from you soon. --Mattwj2002 (talk) 10:11, 15 December 2008 (UTC)
Arch dude, please start by finding the best possible scans from the Internet archive and add them here. Once we get a complete list, I'll start uploading again. It'll probably have to wait until after the holidays. --Mattwj2002 (talk) 10:59, 16 December 2008 (UTC)

welcome back

You've been hiding! Nice to see you back and active. I have modified bits so that you are now autopatrolled. -- billinghurst (talk) 09:57, 6 September 2009 (UTC)

DNB listing issue

I have tried to respond to your request for a DNB project "how-to": see the talk page thread there. Is that what you meant? Details can be expanded. Thanks for the general encouragement: 2009 was mostly about trying to make progress on a broad front, mapping out the problems while not getting too hung up on things that are hard to resolve.

The "how-to" write-up brought it home to me that there is still one area where progress has been slow, namely the volume ToCs. Having people add names one by one (or even three by three) is not too efficient. Some new thinking seems to be required, at one point. Namely, the way to do it should be to start with the listings at the back of each volume. After proofing those, the listings should be tidied down to a list of articles only (I think we remove the "redirect" entries for the volume ToCs), and conventions applied, mostly removing aristocratic titles. Then - and this is the new feature - I think the format with [[Smith, John (DNB00)|Smith, John]] is not the right way to proceed. It is superior to do it as {{DNB lkpl|Smith, John}}. The outward effect is identical, but this is much handier to create from a list of 500 names; and what is more the list is then more easily processed into any other format required. Charles Matthews (talk) 08:53, 5 January 2010 (UTC)

The category for DNB-without

There is one odd thing about w:Category:Articles incorporating DNB text without Wikisource reference, which is that it fails to pick up defaultsort from <ref>{{DNB}}</ref>. Which is an peculiar construction anyway, but there are a couple of examples on the first page of the category. That should always be replaced by <ref>{{DNB Cite|.}}</ref>, naturally. Don't ask me why it happens - do you have any idea? Charles Matthews (talk) 21:01, 2 February 2010 (UTC)

Probably when the order of the rendering of the additions and when DefSort gets applied. POEM and REF are a weird and wonderful beasts sent to keep our sanity levels off-kilter billinghurst sDrewth 01:03, 3 February 2010 (UTC)
We migh be able to work around this in the affected articles by moving the "defaultsort" to the beginning of the article. -Arch dude (talk) 12:42, 4 February 2010 (UTC)
I had more in mind creating the relevant articles here, though! I mentioned it mainly in case there was some sort of malignant bug involved. It's not a serious issue for working with the category, just obvious to anyone working through in order. Charles Matthews (talk) 16:57, 4 February 2010 (UTC)

Your concerns

I appreciate what you're saying over at the DNB Project Talk, but we are where we are. Trying to look at it straight, the original uploads did not always choose the correct scans, and basic checks for completeness weren't run at that point. I have encountered related issues at the Catholic Encyclopedia project, where the bot run simply missed at least 3% of the work. I think the correct analysis is not "something should be done" (all agree), but noting that the Catholic Encyclopedia is neglected even though WP references it at least 5000 times (and nobody much cares), the real issue is having a DNB project with enough momentum to push the fixing of issues up the list of priorities. We are roughly at that point, I think, where the urgency does seem to be there (volunteers come along, and we'd like them to encounter a situation in somewhat better shape).

Still, the improvements are going to have to be piecemeal, incremental changes. The initial troubles came from wanting to say "all done" for the uploads; sorting it all out will take a while yet, and asking for rapid progress sounds to me like a recipe for hurried work, which is exactly your other concern really. I believe all edits are retained somewhere on the database, but I don't have any details of how ProofreadPage actually operates; so I can't tell you for sure that administrator access would be able to retrieve all the work. But perhaps others can cover the technical details. I feel the project does have the human resources to make great strides in 2010, and I'd hope your "issues" can be addressed in that context. Charles Matthews (talk) 10:29, 16 February 2010 (UTC)

I did not mean to sound as if I am complaining. I will continue to work as I have been, i.e., sporadically. I am very encouraged by the project's progress: I merely wish to avoid creating extra work for you and Billinhurst, so if there are changes I can make to my work methods to this end, I wish to make them. -Arch dude (talk) 15:48, 16 February 2010 (UTC)

Page splice for Humberston, Francis Mackenzie (DNB00)

Thank-you !! JamAKiska (talk) 01:30, 18 May 2010 (UTC)

Page splice for Dundas, David (1735-1820) (DNB00) Thank you. Daytrivia (talk) 11:34, 8 June 2010 (UTC)

When doing Author fixes

Gday Arch dude. When you are doing the author fixes on the older articles, I was wondering whether you would also be able to do the
| volume = XX addition to the header, and look to convert the transclusion to <pages>

<div class="indented-page">
<pages index="Dictionary of National Biography volume XX.djvu" from= to= fromsection="" tosection=""/>

I have followed you through tonight and got the earlier edits, and will try to get the later lot, though no promises as it is very late (early) Thanks. — billinghurst sDrewth 16:44, 3 July 2010 (UTC)

Opinion sought on DNB disambiguation

Please comment in the thread at User talk:JamAKiska#Disambiguation criteria, on the issue of whether we should have titles effectively contained in that of another biography. Charles Matthews (talk)

Re DNB missing articles

This is certainly one of my concerns, since the construction of the volume ToCs is fallible, surely. The Catholic Encyclopedia project is missing some hundreds, and I only wish I had more time to sort it out.

Here's how I see it:

  1. I have now completed the Author page lists (except for about 300 anonymous articles), though I'm fallible also. But we should catch "missing" articles via the redlinks on these lists.
  2. Patches of neglected text will show up as validation proceeds.
  3. Checking of the volume lists posted on WP will prompt the asking of good questions.

In particular as the WP volume lists are worked over, we should add pages numbers (to clarify the subarticles and the Supplement articles, and to fix some bad errors, transpositions and omissions caused by poor OCR).

Given that all three approaches are capable of finding what's been missed, we should get there in the end. Charles Matthews (talk) 14:31, 27 September 2010 (UTC)

Ultimatey, we should ask a competent 'bot programmer to build a 'bot. The 'bot would check each page in turn in page space to see that the page is completely spanned by sections (i.e., each portion of the page is within a section.) The 'bot would then check the page's "what links here" list to see that each section is listed in an article. The 'bot would then verify that each article is in the TOC. The 'bot would then list all pages for which these checks fail. -Arch dude (talk) 14:46, 27 September 2010 (UTC)

I think the functionality of the Magnus tool (second section) already does some of that, picking up non-transcluded pages (those marked "Proofread", I believe). Charles Matthews (talk) 16:36, 27 September 2010 (UTC)


Have outlined first step to transclusion on DNB01 on John's talk page. Time permitting please review. Thanks in advance...JamAKiska (talk) 16:04, 3 October 2010 (UTC)

