Index talk:Congressional Record Volume 81 Part 3.djvu

Latest comment: 9 years ago by Waldir in topic Uploading new djvu files

Splitting the pages

edit

@Waldir: I've seen your post to Tpt on fr.ws and I'm actually downloading the PDF so I'll split the pages with ABBYY to reupload them... --Ernest-Mtl (talk) 19:26, 25 July 2015 (UTC)Reply

Woohoo, awesome! Thanks a bunch. By the way, let me know if you run into problems uploading the file. I spent several hours trying different approaches until I got this to work with some IRC help. --Waldir (talk) 19:27, 25 July 2015 (UTC)Reply
@Waldir: Just one thing I just thought though... I can't make pdf with a text layer. Only djvu files... Do you mind? --Ernest-Mtl (talk) 19:54, 25 July 2015 (UTC)Reply
Worst case scenario, I could prepare the PDF, without text layer and upload it so you can take it and add the text layer...
@Ernest-Mtl: Of course not, djvu is fine. I only care that the scanned pages are available for proofreading here :) --Waldir (talk) 20:40, 25 July 2015 (UTC)Reply
@Waldir: Ok... Working on it right now... Splitting pages and cropping the black areas around the pages... Then I'll do OCR, make the DJVU and let you know when it'll be on commons. --Ernest-Mtl (talk) 21:08, 25 July 2015 (UTC)Reply
PS: I've seen that some pages are not fully aligned, so I'll run a realignment on the text in order for the OCR and proofreading to be easier as well... :)
@Ernest-Mtl: I wonder if there isn't any way to reuse the text layer already present in the pdf, rather than starting from scratch? Or are you confident that the OCR you can perform is higher-quality than the Internet Archive's? --Waldir (talk) 22:00, 25 July 2015 (UTC)Reply
@Waldir: Hello! usually, my OCR is better than the one of IA... :) I use the professional ABBYY suite... --Ernest-Mtl (talk) 23:05, 25 July 2015 (UTC)Reply
Got it :) Please proceed! --Waldir (talk) 00:26, 26 July 2015 (UTC)Reply
@Waldir: Hello! Don't worry, everything is done except, for no reason, commons doesn't let me upload the file (413 Request Entity Too Large)... I've posted a message but haven,t received an aswer yet. --Ernest-Mtl (talk) 00:06, 27 July 2015 (UTC)Reply
@Ernest-Mtl: Ok, so as I suspected you bumped into issues with the uploading. I had them as well. Do you have chunked uploads activated in your preferences? That's the first step to get it to work (otherwise your upload limit is 100MB). If you do, then take a look at commons:Special:UploadStash. Does a file appear there? If so, I may be able to help you to publish it.
Alternatively, if you don't feel like jumping through a lot of hoops to upload the file (it's a temporary bug that will be resolved soon anyway), feel free to send me the file via dropbox, google drive, mega or any other online hosting service you like, and I'll attempt to upload it myself. I had some trouble uploading the original pdf, but eventually was able to get it to work, so I believe I may be able to work with this one as well. --Waldir (talk) 00:57, 27 July 2015 (UTC)Reply
@Waldir: Ahhhhh! Yes, I had the chunked activated. My file is in the stash... What can I do from there? --Ernest-Mtl (talk) 01:06, 27 July 2015 (UTC)Reply
@Ernest-Mtl: Follow the instructions here. Let me know if it works :D --Waldir (talk) 03:33, 27 July 2015 (UTC)Reply
@Waldir: Wonferful! It worked! I had to upload it on a temp name though and rename it afterward... I think the system couldn't handle the same filename for this kind of big file... File:Congressional Record Volume 81 Part 3.djvu --Ernest-Mtl (talk) 04:10, 27 July 2015 (UTC)Reply
Great!! It looks amazing, superb job :D I'll start a new thread to discuss non-upload issues. --Waldir (talk) 12:42, 27 July 2015 (UTC)Reply

Page rename

edit

I changed the index name from pdf to djvu like I use to do on the French Wikisource but something seems problematic : numeric value needed... :( I would have thought all the WS were working the same way... Sorry, should have left it as it was! Gulp! --Ernest-Mtl (talk) 04:30, 27 July 2015 (UTC)Reply

That's ok, I'm sure it can be figured out. It is preferable to have the page in the new name. I'll try to solve this later today. Thanks a million for all the help :) --Waldir (talk) 12:42, 27 July 2015 (UTC)Reply
@Ernest-Mtl: Woah, there's something strange going on with the file: page 130 of the djvu file corresponds (correctly) to page 2504 of the document; page 131 is then 2505, as expected; but page 132 is 2504 again! And they are different images, not the same one twice -- see p.130 and p.132. Same happens with p.131 and p.133. Possibly an error in the source?
Also, note that there must be other places where this happens, since the total number of pages is 1199, but real pages should be only 1194: title + blank + 1191 pages (2377 to 3568) + blank. So 2504 and 2505 appearing twice accounts for 2 of those extra pages, but there are 3 more lurking around somewhere (I can dig them up but first let me know what you think about this). --Waldir (talk) 16:24, 27 July 2015 (UTC)Reply
Update: they appear to have been duplicated in the scanning phase, see p.77 and p.78 of the unsplit pdf file. I guess that one needs to be fixed too. --Waldir (talk) 16:33, 27 July 2015 (UTC)Reply
@Waldir:... Ok... looking into this right now... --Ernest-Mtl (talk) 19:22, 29 July 2015 (UTC)Reply
Already found 2726 & 2727 as duplicates too...--Ernest-Mtl (talk) 19:29, 29 July 2015 (UTC)Reply
@Waldir:... ok, these were the last one... I have 1195 pages : (cover + blank + 1192 pages (2377@3568) + blank... I'll recompile the djvu... --Ernest-Mtl (talk) 19:43, 29 July 2015 (UTC)Reply
@Ernest-Mtl: Huh, I must have miscalculated, then. So there are 4 extra pages, not 5. Cool :) I'll remake the pdf as well. --Waldir (talk) 19:50, 29 July 2015 (UTC)Reply
@Waldir: Have you found a way to put back the pagelist on the main page, since the rename from pdf to djvu? Or we need to delete this page et redo the entry? --Ernest-Mtl (talk) 20:09, 29 July 2015 (UTC)Reply
Confirmed, I miscalculated earlier. Those are indeed the only duplicate pages: 2504-2505 (78 in the pdf, 132-133 in the djvu) and 2726-2727 (190 in the pdf, 356-357 in the djvu). I've produced a deduplicated pdf and will upload it now.
As for the pagelist error, I think it might be a matter of filling the details correctly. I'll give it a try, and if needed, I'll ask on IRC. Deleting the page will probably be the last resort. --Waldir (talk) 20:55, 29 July 2015 (UTC)Reply

Uploading new djvu files

edit

@Waldir: The new DJVU files in being uploaded right now. --Ernest-Mtl (talk) 21:14, 29 July 2015 (UTC)Reply

Oops... File entity too large... I wonder how we'll come accross this one! :( --Ernest-Mtl (talk) 21:17, 29 July 2015 (UTC)Reply
@Ernest-Mtl: Isn't that the same issue as before? I had the same problems this time (to upload the deduplicated pdf) that I had earlier to upload the original file, and had to solve them as mentioned above. Doesn't it work anymore for you? By the way, note that you can't use the same filename as before ("Congressional Record Volume 81 Part 3.djvu"); it will need to be history-merged afterwards with the original filename. but don't worry about that, I'll take care of it :) --Waldir (talk) 21:38, 29 July 2015 (UTC)Reply
@Waldir: Ok.. I'll upload it under a new name and let you know... Tomorrow though, now it's kinda late! :) --Ernest-Mtl (talk) 04:49, 30 July 2015 (UTC)Reply
Heheh, you're staying up late too, I see :) No problem, looking forward to it! --Waldir (talk) 12:15, 30 July 2015 (UTC)Reply
@Waldir: Yep... I've discovered in Quebec's archive a novel that was never published as a book, only in weekly parts in a Montreal newspaper then afterward, forgotten... So I'm quite excited to put it all back with Wikisource... For the congress file, all done... the new uploaded filename is File:CR V81 P3.djvu... --Ernest-Mtl (talk) 14:45, 30 July 2015 (UTC)Reply
@Ernest-Mtl: Nice find with the novel! As for the DjVu, I merged the files, but I am afraid the file might have some problem: the summary in the versions table shows up as "0 × 0 (257.95 MB)", as opposed to the original "3,264 × 4,416, 1,199 pages (258.84 MB)". Can you check it out and see if it's some error in the actual djvu file? --Waldir (talk) 15:41, 30 July 2015 (UTC)Reply
@Waldir: I see... :( It looks that it didn't handle the last upload correctly. I'll send again... --Ernest-Mtl (talk) 02:56, 3 August 2015 (UTC)Reply
Hello! Did once again the file, re-uploaded it... Keeps saying 0x0... :( I guess the best way to fix everything is to keep the 1st one I made and indicate the duplicate pages as not to proofread... Can't see why it does that... Maybe the stash thing... dunno... Tried another djvu and it worked great... And my local file works fine too... So I guess commons is not dealing well with this file replacement... :( Sorry. --Ernest-Mtl (talk) 04:24, 3 August 2015 (UTC)Reply
There seems to be a bug about some djvu files. I opened phab:T107664 to track it (please subscribe if you have a phabricator account), and maybe someone will figure it out. Let's use the duplicate version for now, then :) Thanks so much for all your help! Waldir (talk) 12:02, 3 August 2015 (UTC)Reply
edit: @Ernest-Mtl: just one last thing: can you download the 0x0 djvu file from wikimedia commons and open it? It seems that it may be corrupt, according to a comment in the bug I linked above. If your local copy opens, but the one downloaded from commons doesn't, then we'll know it's a problem in the upload step.
Alternatively, you could upload the local version to some other web host (google drive, dropbox, mega...) and share the link so others can do the debugging. --Waldir (talk) 12:08, 3 August 2015 (UTC)Reply

@Ernest-Mtl: hi again. Your help is needed to debug this issue :) can you send me the file you've been trying to upload to commons through some other means? Then I can test whether the problem is in the file itself or whether the upload somehow messes it up. --Waldir (talk) 20:38, 16 September 2015 (UTC)Reply

@Waldir: Hello! While migrating from Windows 8.1 to 10, I lost my djvu directory and all the 500something djvu files that were in it... This one was in it as well...  :( --Ernest-Mtl (talk) 21:19, 16 September 2015 (UTC)Reply
@Ernest-Mtl:: Oh, crap. I'm sorry to hear that. In better news, there's been some progress: in the bug thread, Bawolff suggested to try opening the DjVu files which present this issue using evince, and I found out that every one of them except yours contains errors. So I assume your uploads went well, but something in the processing of the file in the Wikimedia servers went wrong (possibly due to the large file size). By the way, do you think you could download the other four files listed there (File:Ten Years Later.djvu, File:Niva 1906-45.djvu File:Niva 1906-43.djvu and File:Niva 1894-31.djvu) and fix them with your DjVu tools? --Waldir (talk) 14:31, 18 September 2015 (UTC)Reply

Finally! After over 3 months, the bug has finally been fixed :) Onto the actual work, now ;) --Waldir (talk) 21:30, 4 November 2015 (UTC)Reply