Wikisource:Bot requests/Archives/2010

Warning Please do not post any new comments on this page.
This is a discussion archive first created in 2010, although the comments contained were likely posted before and after this date.
See current discussion or the archives index.


Swapping header templatesEdit

Is there any way to automate the swapping out of {{header}} and {{header2}} for {{Potus-eo}} for those Executive Order articles not already using the custom header template? The default parameters for minimal problems (I'm guessing) in such a conversion with what you'd typically find in an existing EO are...

 | title    = Executive Order - 13206
 | author   = William Jefferson Clinton
 | section  = 
 | previous = [[Author:William Jefferson Clinton/Executive orders|William 
Jefferson Clinton's Executive Orders]]
 | next     = 
 | notes    = Establishment of the blah… blah…<br>Delivered on 23 December 2001.
{{Potus-eo}} template KEY

 | eo        =  assigned 4 or 5 digit EO number  *-= A MUST! =-*
 | title     =  Populate using format [Executive Order {{{eo}}}]
 | author    =  Create or Transfer if present
 | section   =  Create or Transfer if present
 | year      =  Populate w/Default [blank]
 | month     =  Populate w/Null Value double-zero [00]
 | day       =  Populate w/Default [blank]
 | cite      =  Populate w/Default [''Federal Register'' page and date: -N/A-]
 | fr-vol    =  Populate w/Default seventy-four [74]
 | fr-page   =  Populate w/Default upper-Roman 2 [II]
 | fr-year   =  Populate w/Default year [2010]
 | fr-month  =  Populate w/Default month [01]
 | fr-day    =  Populate w/Default day [1]
 | notes     =  Create or Transfer if present

 | eo       = 13206
 | title    = Executive Order 13206
 | author   = William Jefferson Clinton
 | section  = 
 | year     = 
 | month    = 00
 | day      = 
 | cite     = ''Federal Register'' page and date: -N/A-
 | fr-vol   = 74
 | fr-page  = II
 | fr-year  = 2010
 | fr-month = 01
 | fr-day   = 1
 | notes    = Establishment of the blah… blah…<br>Delivered on 23 December 2000.

... so whatever may currently exist in {{{section}}}, {{{notes}}}, etc. should transfer to like parameter in the new. George Orwell III (talk) 08:51, 26 November 2009 (UTC)

Pretty sure that we can get a bot to do the translation from one template to the other
  • do the blank values need to be in the template? Or can they be omitted?
  • is the >4 or 5 digit EO #< a variable that can be pulled from the work?
  • are there other values that may exist that can be grabbed from the file, eg. categories that complete the data?
billinghurst (talk) 05:18, 27 November 2009 (UTC)
  • Well the blank values would prevent any bangs from occuring but still makes the citation bar appear & be useful. Any existing {{{section}}} or {{{notes}}} values shoud transfer to the new template just fine.
  • The series of articles currently follow the page naming format "Executive Order 13388" for the most part (13388 being the 5 digit EO number in this case) but the {{{title}}} field can have "Executive Order 13388 - To order something for something by something" at times.
  • The template automatically adds the 2 relevant CATs using a helper template {{Potus-eo-data}}when applied. The third one, "PD-USGov", already exists in most of the existing EOs but not all unfortunately.
  • and as a starting point, Executive Order 13206 and higher are to be swapped. I think we've done all those earlier than 2001 (lower than EO # 13206} manually by now anyway. In fact its possible that everything using a 4 digit EO# or lower may have been manually covered already so maybe going [Executive Order 10000] and up makes more sense.
George Orwell III (talk) 07:48, 1 January 2010 (UTC)
Just adding a couple bits... the "notes" field needs to be blank if not supplied; the others do not need to be there. Every EO has a date though, and all from early 1936 and on have a Federal Register citation, so those parameters probably should be left in for easier editing later (if the bot is *really* good, it can pick up the FR citation and date (and subtitle for that matter) from the executive order listing pages). The "title" value is defaulted to "Executive Order (num)" so, strictly speaking, it is not necessary -- but doesn't hurt. The "section" field is for the subtitle of the executive order; it is often already there but I have seen it sometimes be in the "notes" section. Sometimes the date is in the notes as well. Probably no way to really determine that though, and may be best just to leave it, and keep the same values for "notes" and "section" if they are already there.
Also, the template adds a DEFAULTSORT setting, so if the main body of the text already has one (never seen one yet) it should be removed. Carl Lindberg (talk) 01:57, 28 November 2009 (UTC)
Coming up on +3 months George Orwell III (talk) 09:00, 17 February 2010 (UTC)
That task looks manageable. If all the articles that needed the template swap were in a category (Category:Potus-eo needed, for example) this would be easy via regular expressions. The following will direct a pywikipediabot to do a simple swap like the one outlined above (may need a little tweaking):
python -cat:Potus-eo_needed -regex "{{header2.*\|.*title.*=.*Executive Order - (?P<EO>[0-9]*).*\|.*author.*=(?P<AUTHOR>.*)\|.*section.*=(?P<SECTION>.*)\|.*previous.*=.*\|.*next.*=.*\|.*notes.*=(?P<NOTES>.*)}}" "{{Potus-eo\n | eo = \g<EO> \n | title = Executive Order \g<EO> \n | author = \g<AUTHOR> \n | section = \g<SECTION> \n | year = \n | month = 00\n | day = \n | cite = ''Federal Register'' page and date: -N/A-\n | fr-vol = 74\n | fr-page = II\n | fr-year = 2010\n | fr-month = 0\n | fr-day = 1\n | notes = \g<NOTES> \n}}" -dotall
If you want the bot to figure out which articles to swap, things will be slightly more complicated. Let me know if you have any questions or want a more thorough explanation. Cheers, stephen (talk) 06:22, 18 June 2010 (UTC)
It is down to 300 or so EOs not under the custom header that are peppered throughout Category:Executive orders of 2001 thru Category:Executive orders of 2008. There are some that have been converted manually already if that's what you meant. George Orwell III (talk) 08:57, 18 June 2010 (UTC)

  Done by BenchBot. For future reference, he used this code:

python -cat:Executive_orders_of_2001 -regex "{{header2.*\|.*title.*=.*Executive Order (?P<EO>[0-9]+).*\|.*author.*=(?P<AUTHOR>.*)\|.*section.*=(?P<SECTION>.*)\|.*previous.*=.*\|.*next.*=.*\|.*notes.*=(?P<NOTES>(.* (?P<DAY>[0-9]{1,2}) (?P<MONTH>[A-Za-z]+) (?P<YEAR>[0-9]{4}).*))}}" "{{Potus-eo\n | eo = \g<EO> \n | title = Executive Order \g<EO> \n | author = \g<AUTHOR>| section = \g<SECTION>| year = \g<YEAR> \n | month = \g<MONTH>\n | day = \g<DAY> \n | cite = ''Federal Register'' page and date: -N/A-\n | fr-vol = 74\n | fr-page = II\n | fr-year = \g<YEAR>\n | fr-month = \g<MONTH>\n | fr-day = \g<DAY>\n | notes = \g<NOTES> \n}}" -dotall -summary:"Robot: replacing [[template:header2]] with [[template:Potus-eo]]"

Let me know if he missed anything. stephen (talk) 09:31, 27 June 2010 (UTC)

I scrolled through 2001 to current and they've all been converted to the custom header. With that there should be no EO article using header or header2 and this most recent decade all have links automatically generated to a source doucument and related citation info at the very least. Many thanks again and can somebody archive this when appropriate. George Orwell III (talk) 18:54, 27 June 2010 (UTC)

Chapters ' iwikiEdit

Set iwikis for Turgenev's works as follows:

And set next/prev links for Rudin. -- Sergey kudryavtsev (talk) 07:23, 2 June 2010 (UTC) -- Sergey kudryavtsev (talk) 07:23, 2 June 2010 (UTC)

  Done with my SKbot. -- Sergey kudryavtsev (talk) 08:02, 29 September 2010 (UTC)

Unassigned requestsEdit

Interlinking sections of Alabama State Constitution of 1901Edit

Within the text of this constitution, numerous amendments refer to previous amendments. EG. Here, at the bottom is "amendment III [3] to the Constitution of Alabama", which should link back to here. Also, things like "section 260 of article XIV of the Constitution", should link back to the appropriate section. Note that later on the Roman numerals are discarded. Is this possible to do (relatively easyly) with a bot? 22:50, 6 July 2008 (UTC)

There are many variations, and the links sometimes refer to the constitution and sometimes to the amendment:
  • articles: "Article X, Section 10";
  • sections: "section 10 of this /Constitution|article/", "section 10", "section 10.09", "sections 10.13, 10.14 and 10.15", "Section 10 1/2", "section 10 of article X", "section 10, article X", "section 10 of article 10", "Section 10. SECTION 10";
  • amendments: "amendment 10", "amendment X", "third amendment";
  • links to other texts: "Title 22, section 189, Code of Alabama 1940", "Code of Alabama, Title 22, section 189", "section 265 of Title 37 of the Code of Alabama of 1940";
  • exceptions: "...title to that certain sixteenth section of school lands described as follows: section 16, township 4 south,...".
A script could do most of the work, but it be very heuristic. It would require a human willing to carefully review the changes to make sure they're in the correct context, and possibly search through the text for missing links. If you're willing to do this followup work, Pathosbot can take care of this task. —{admin} Pathoschild 13:07:26, 02 October 2008 (UTC)

Index linkingEdit

Would it be possible to run a bot to check WhatLinksHere, and if it finds a Wikisource: index as listed at Wikisource:Works, and the current |previous= parameter is empty, then it inserts a back-link to the index? It would help tremendously, especially with things like poetry which are going to be a pain in the ass to get Indexes running. Sherurcij Collaboration of the Week: Author:Augustus John Cuthbert Hare 05:32, 19 February 2008 (UTC)

I suggest implementing {{indexes}}, since using the "previous" parameter provides incorrect metadata. See my proposal on the Scriptorium. —{admin} Pathoschild 23:10:09, 15 May 2008 (UTC)

Normalize US patentsEdit

US patents have been added with various naming conventions and formats, which should be synchronized. —{admin} Pathoschild 06:11:53, 08 May 2008 (UTC)


Migrate all use of the old single parameter invocation of {{Indent}} to either the new calling convention, or some other approach. John Vandenberg (chat) 01:01, 29 May 2008 (UTC)

Could you explain the new calling convention? The template page still advocates a single-parameter invocation. —{admin} Pathoschild 01:17:40, 29 May 2008 (UTC)
Documented on the template page. The other approach is to replace these invocations with <poem> . John Vandenberg (chat) 01:32, 29 May 2008 (UTC)
I see you changed "{{indent|number}} text" to "{{indent|text|number}}". Do you have any objection to using "{{indent|number|text}}" instead, to maintain consistency and line up the indentation amounts and texts? The bot can easily correct any pages using either old format. —{admin} Pathoschild 12:07:37, 04 July 2008 (UTC)
"number" is currently optional, and I have converted quite a few to the new syntax, and have been like that for quite a while, so I would rather not have the syntax change, as the current template is what is called when old revisions are viewed. Most cases where a number is specified would be better served with <poem>. John Vandenberg (chat) 12:21, 4 July 2008 (UTC)
I am not sure of the status of this job, and its specific requirements. Is it to be <poem> or modification of {{indent}}. -- billinghurst (talk) 13:29, 15 March 2009 (UTC)

Cut up formated pagesEdit

All the remaining Easton's Bible Dictionary articles (maybe 90% by count) are formated and on these 6 pages below. They need to be cut up and put in separate pages. Can a bot do it? --Carlaude (talk) 15:56, 21 August 2008 (UTC)

Importing text from DJVU to pagesEdit

Texts layers now underlying.

-- billinghurst (talk) 16:25, 26 September 2009 (UTC)

Basic OCR FixEdit

Could somebody run a bot to replace all instances of tiie or Tiie or TIIE with "The", with the same capitalisation? Google tells me, there are well over a hundred such instances, all from OCR-ed texts. Sherurcij Collaboration of the Week: Author:Romain Rolland. 22:06, 23 March 2009 (UTC)

  Done I think I got them all, but we'll see when special:search catches up. -Steve Sanbeg (talk) 21:19, 20 April 2009 (UTC)

nope, still more than 400 "tiie"s on WS Sherurcij Collaboration of the Week: Author:Carl Linnaeus. 12:05, 15 September 2009 (UTC)
  Done tiie, need recheck

Hgures/hgures = figures Sherurcij Collaboration of the Week: Author:Carl Jung. 05:21, 21 April 2009 (UTC)
WiUiam=William, over 400 errors on WS. Sherurcij Collaboration of the Week: Author:Carl Jung. 06:14, 21 April 2009 (UTC)
  Done OK -Steve Sanbeg (talk) 22:43, 21 April 2009 (UTC)
Nope, still 70 remaining Sherurcij Collaboration of the Week: Author:Carl Linnaeus. 12:05, 15 September 2009 (UTC)
Didn't find 70, though updated those that my Google search discovered. Will need to db check. -- billinghurst (talk) 14:56, 15 September 2009 (UTC)
"bv" = "by", only on Page: namespace, not in main or other, and only in lowercase. There are nearly 5000 instances of this OCR typo on Page: namespace it seems. Sherurcij Collaboration of the Week: Author:Carl Jung. 19:10, 23 April 2009 (UTC)
Tentatively   Done only had about 500, not the extra order of magnitude. May need a rescan in a while. -- billinghurst (talk) 10:13, 16 September 2009 (UTC)
"tiiis" to "this", 85 instances[1] Sherurcij Collaboration of the Week: Author:Carl Linnaeus. 12:03, 15 September 2009 (UTC)
tentative   Done . My google search only showed 67, these are done, though we should db check. -- billinghurst (talk) 14:38, 15 September 2009 (UTC)
  Done -- billinghurst (talk) 13:37, 3 October 2009 (UTC)
  Done where worthwhile, skipped a lot of big ugly works needing some splitting and cleanup -- billinghurst (talk) 13:37, 3 October 2009 (UTC)

With the new PAGE: subspace, these are all an increasing problem - Google shows 10,000 "tiie" that should all be "the" in the PAGE namespace...can we schedule this to run once a month or something? Sherurcij Collaboration of the Week: Author:Thomas Carlyle. 18:45, 17 February 2010 (UTC)

I probably should apply for a toolserver account one of these days; but I just checked special:search, and only found a little over 200, which didn't take long to fix. -Steve Sanbeg (talk) 03:44, 19 February 2010 (UTC)

"centurv" to "century", please. Sherurcij Collaboration of the Week: Author:Thomas Carlyle. 23:52, 2 May 2010 (UTC)

Of the 48 occurrences of "centurv", 45 are in the Page: namespace - these are not transcluded because they are uncorrected ocr. 2 of the 3 instances in main space are also uncorrected ocr that need the scans to bring to standard: this and this have bigger problems than a single typo. I fixed the important occurrence of "centurv" to "century". Cygnis insignis (talk) 01:09, 3 May 2010 (UTC)
Seems that we need a means to identify missspeeling based on the proofread status. — billinghurst sDrewth 06:58, 3 May 2010 (UTC)
The point is that by correcting "centurv" on a shitty unproofread page, we will have less work to do when we do proofread that page, if these "25 commonly misspelled words" have already all been corrected across all pages. And since we often copy/paste OCR dumps from and such - it is not only the PAGE namespace that needs the fixes. Sherurcij Collaboration of the Week: Author:Thomas Carlyle. 15:14, 3 May 2010 (UTC)
Being able to search on proofread status is a great idea, I reckon that would have lots of applications.

Noting these errors is very useful, we should build a list of those that will reoccur. This could eventually be used to improve the ocr in an index, with a bot making many common corrections. Having to load the image to confirm corrections would be factor when considering efficiency. Some users are already doing this semi-automatically with javascript, they could use the same list and add the ocr errors that appear in their Indexed copy. As users of the Page namespace would know, this can only catch some of the errors and one needs a scan to fix the rest. They would also realise that correcting obvious errors, or guesswork, is as far as a page of uncorrected ocr can progress without referring to a scan. Duplicating pages from IA, and so on, in mainspace, without a scan, or its source, is pointless. Cygnis insignis (talk) 05:04, 4 May 2010 (UTC)


Wikisource:Caliphs - can somebody change [[Author:Foo]] to [[Author:Foo|Foo]]? Merci. Sherurcij Collaboration of the Week: Author:David Livingstone. 18:41, 15 October 2009 (UTC)

  Done . Didn't need a bot, just used the Custom Regex in the sidebar. You are aware that you can code those like [[Author:Foo|]]? -- billinghurst (talk) 23:07, 15 October 2009 (UTC)==See also==

Unassigned requestsEdit

Updating CIA World Fact Book pagesEdit

Would it be possible for a botop to duplicate this format and create new pages with 2008 information? Thanks. Stepshep (talk) 22:50, 29 September 2008 (UTC)

I'd like to take a crack at it. With 2009 info, of course. :-) I'll experiment with coding this, assuming someone else hasn't already. --LarryGilbert (talk) 19:23, 14 November 2009 (UTC)
Go for it. I also note that recently a number of images were removed from the earlier version, so we may just wish to be aware of those links. billinghurst (talk) 23:19, 14 November 2009 (UTC)