Age | Commit message (Expand) | Author |
2021-07-23 | dehyphenate: Update to reflect multiple page support in hocr package | Nick White |
2021-07-23 | iiifdownloader: Fixed error printing | Nick White |
2021-07-23 | gofmt | Nick White |
2021-06-08 | iiifdownloader: remove old and incorrect part which could cause errors | Nick White |
2021-05-11 | Handle pages with png suffix correctly | Nick White |
2021-05-11 | Update dlgbook usage statement to reflect that we include the bookid now | Nick White |
2021-05-10 | dlgbook: Strip special characters from authors as well as titles | Nick White |
2021-05-10 | dlgbook: add google book id to the end of the directory name, and limit lengt... | Nick White |
2021-03-25 | extracthocrlines: ensure opened files are closed promptly, to forego any too ... | Nick White |
2021-03-25 | extracthocrlines: Fix syntax error typo | Nick White |
2021-03-23 | extracthocrlines: Replace -e with -b, its opposite, and make it default | Nick White |
2021-03-23 | extracthocrlines: Skip empty text lines | Nick White |
2021-03-23 | hocr: Add ability to specify a custom image path for hocr line extraction, an... | Nick White |
2021-03-16 | Report book dir to use before starting getgbook, in case it is unwanted | Nick White |
2021-03-16 | dlgbook: add new tool to wrap around getgbook, automatically setting the auth... | Nick White |
2021-02-09 | Add extracthocrlines tool | Nick White |
2021-02-02 | [eeboxmltohocr] Fix bug causing error if there were many hocr files | Nick White |
2021-01-25 | Fix generic IIIF downloading to fix special-case for the Bodleian only | Nick White |
2020-10-26 | [iiifdownloader] Add -insecure flag to ignore TLS errors | Nick White |
2020-10-13 | [iiifdownloader] Catch SIGINT when writing a file to remove half-written file... | Nick White |
2020-10-13 | Improve error handling, and ensure incomplete page downloads are removed | Nick White |
2020-10-12 | [analysestats] skip zero confidence pages from stats | Nick White |
2020-09-28 | [iiifdownloader] Add a TODO to switch to tile based downloading | Nick White |
2020-09-28 | [iiifdownloader] Work around oxford needing the iiif suffix adding to its id | Nick White |
2020-09-28 | [iiifdownloader] Default to iiifmanifest type if none is given and no definit... | Nick White |
2020-09-28 | Make page numbering more generic to handle more iiif variety, and add harvard... | Nick White |
2020-09-28 | Add ability to pass -service to choose which download type to use, plus add a... | Nick White |
2020-09-22 | [analysestats] complete | Nick White |
2020-09-22 | [analysestats] Parse hocr for training used | Nick White |
2020-09-21 | Add wip analysestats command | Nick White |
2020-09-21 | Use strings.Replace rather than strings.ReplaceAll so that it works on older ... | Nick White |
2020-09-12 | Add todos | Nick White |
2020-09-08 | Add the option to force METS usage for BSB | Nick White |
2020-09-08 | Switch to using generic page downloader for BNF | Nick White |
2020-09-08 | Improve urlToPgName so it can be used by BNF too | Nick White |
2020-09-08 | Improve urlToPgName and documentation | Nick White |
2020-09-08 | Sanitise URLs so that // in url doesn't cause issues (bsb site can spew these) | Nick White |
2020-09-08 | Switch from METS to IIIF manifest for BSB downloading, as it returns higher q... | Nick White |
2020-09-08 | [iiifdownloader] BSB downloading works now, by parsing METS XML | Nick White |
2020-09-07 | [iiifdownloader] Split out NoPgNums downloading to its own function | Nick White |
2020-09-07 | Add skeleton of bsb support | Nick White |
2020-08-25 | Move dehyphenate string code into its own function | Nick White |
2020-08-25 | Fixes to dehyphenate | Nick White |
2020-08-25 | Add text mode for dehyphenate tool | Nick White |
2020-06-23 | [iiifdownloader] Only remove 1 duplicate page, as 2nd one may not be duplicat... | Nick White |
2020-06-23 | [iiifdownloader] Add support for BNF urls with a dot after book id | Nick White |
2020-06-23 | Add IIIF downloader, that just supports BNF for now | Nick White |
2020-04-14 | Remove getbests; it belongs with bookpipeline (and putting it there removes a...v0.1.3 | Nick White |
2020-04-14 | Add godoc documentation | Nick White |
2020-03-13 | Add simple "getbests" utility, useful for statistics gathering | Nick White |