summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2021-07-23dehyphenate: Update to reflect multiple page support in hocr packageHEADmasterNick White
2021-07-23iiifdownloader: Fixed error printingNick White
2021-07-23gofmtNick White
2021-06-08iiifdownloader: remove old and incorrect part which could cause errorsNick White
2021-05-11Handle pages with png suffix correctlyNick White
2021-05-11Update dlgbook usage statement to reflect that we include the bookid nowNick White
2021-05-10dlgbook: Strip special characters from authors as well as titlesNick White
2021-05-10dlgbook: add google book id to the end of the directory name, and limit lengt...Nick White
2021-03-25extracthocrlines: ensure opened files are closed promptly, to forego any too ...Nick White
2021-03-25extracthocrlines: Fix syntax error typoNick White
2021-03-23extracthocrlines: Replace -e with -b, its opposite, and make it defaultNick White
2021-03-23extracthocrlines: Skip empty text linesNick White
2021-03-23hocr: Add ability to specify a custom image path for hocr line extraction, an...Nick White
2021-03-16Report book dir to use before starting getgbook, in case it is unwantedNick White
2021-03-16dlgbook: add new tool to wrap around getgbook, automatically setting the auth...Nick White
2021-02-09Add extracthocrlines toolNick White
2021-02-09hocr: Use extracted page name for line namingNick White
2021-02-09hocr: Use image specified in ocr_page title, so can support multipage hocrs c...Nick White
2021-02-02[eeboxmltohocr] Fix bug causing error if there were many hocr filesNick White
2021-01-25Fix generic IIIF downloading to fix special-case for the Bodleian onlyNick White
2020-11-06Add git clone advice to readmeNick White
2020-10-26[iiifdownloader] Add -insecure flag to ignore TLS errorsNick White
2020-10-13[iiifdownloader] Catch SIGINT when writing a file to remove half-written file...Nick White
2020-10-13Improve error handling, and ensure incomplete page downloads are removedNick White
2020-10-12[analysestats] skip zero confidence pages from statsNick White
2020-09-28[iiifdownloader] Add a TODO to switch to tile based downloadingNick White
2020-09-28[iiifdownloader] Work around oxford needing the iiif suffix adding to its idNick White
2020-09-28[iiifdownloader] Default to iiifmanifest type if none is given and no definit...Nick White
2020-09-28Make page numbering more generic to handle more iiif variety, and add harvard...Nick White
2020-09-28Add ability to pass -service to choose which download type to use, plus add a...Nick White
2020-09-22[analysestats] completeNick White
2020-09-22[analysestats] Parse hocr for training usedNick White
2020-09-21Add wip analysestats commandNick White
2020-09-21Use strings.Replace rather than strings.ReplaceAll so that it works on older ...Nick White
2020-09-12Add todosNick White
2020-09-08Add the option to force METS usage for BSBNick White
2020-09-08Switch to using generic page downloader for BNFNick White
2020-09-08Improve urlToPgName so it can be used by BNF tooNick White
2020-09-08Improve urlToPgName and documentationNick White
2020-09-08Sanitise URLs so that // in url doesn't cause issues (bsb site can spew these)Nick White
2020-09-08Switch from METS to IIIF manifest for BSB downloading, as it returns higher q...Nick White
2020-09-08[iiifdownloader] BSB downloading works now, by parsing METS XMLNick White
2020-09-07[iiifdownloader] Split out NoPgNums downloading to its own functionNick White
2020-09-07Add skeleton of bsb supportNick White
2020-08-25Move dehyphenate string code into its own functionNick White
2020-08-25Fixes to dehyphenateNick White
2020-08-25Add text mode for dehyphenate toolNick White
2020-06-23[iiifdownloader] Only remove 1 duplicate page, as 2nd one may not be duplicat...Nick White
2020-06-23[iiifdownloader] Add support for BNF urls with a dot after book idNick White
2020-06-23Add IIIF downloader, that just supports BNF for nowNick White