Age | Commit message (Expand) | Author |
2021-02-09 | Add extracthocrlines tool | Nick White |
2021-02-09 | hocr: Use extracted page name for line naming | Nick White |
2021-02-09 | hocr: Use image specified in ocr_page title, so can support multipage hocrs c... | Nick White |
2021-02-02 | [eeboxmltohocr] Fix bug causing error if there were many hocr files | Nick White |
2021-01-25 | Fix generic IIIF downloading to fix special-case for the Bodleian only | Nick White |
2020-11-06 | Add git clone advice to readme | Nick White |
2020-10-26 | [iiifdownloader] Add -insecure flag to ignore TLS errors | Nick White |
2020-10-13 | [iiifdownloader] Catch SIGINT when writing a file to remove half-written file... | Nick White |
2020-10-13 | Improve error handling, and ensure incomplete page downloads are removed | Nick White |
2020-10-12 | [analysestats] skip zero confidence pages from stats | Nick White |
2020-09-28 | [iiifdownloader] Add a TODO to switch to tile based downloading | Nick White |
2020-09-28 | [iiifdownloader] Work around oxford needing the iiif suffix adding to its id | Nick White |
2020-09-28 | [iiifdownloader] Default to iiifmanifest type if none is given and no definit... | Nick White |
2020-09-28 | Make page numbering more generic to handle more iiif variety, and add harvard... | Nick White |
2020-09-28 | Add ability to pass -service to choose which download type to use, plus add a... | Nick White |
2020-09-22 | [analysestats] complete | Nick White |
2020-09-22 | [analysestats] Parse hocr for training used | Nick White |
2020-09-21 | Add wip analysestats command | Nick White |
2020-09-21 | Use strings.Replace rather than strings.ReplaceAll so that it works on older ... | Nick White |
2020-09-12 | Add todos | Nick White |
2020-09-08 | Add the option to force METS usage for BSB | Nick White |
2020-09-08 | Switch to using generic page downloader for BNF | Nick White |
2020-09-08 | Improve urlToPgName so it can be used by BNF too | Nick White |
2020-09-08 | Improve urlToPgName and documentation | Nick White |
2020-09-08 | Sanitise URLs so that // in url doesn't cause issues (bsb site can spew these) | Nick White |
2020-09-08 | Switch from METS to IIIF manifest for BSB downloading, as it returns higher q... | Nick White |
2020-09-08 | [iiifdownloader] BSB downloading works now, by parsing METS XML | Nick White |
2020-09-07 | [iiifdownloader] Split out NoPgNums downloading to its own function | Nick White |
2020-09-07 | Add skeleton of bsb support | Nick White |
2020-08-25 | Move dehyphenate string code into its own function | Nick White |
2020-08-25 | Fixes to dehyphenate | Nick White |
2020-08-25 | Add text mode for dehyphenate tool | Nick White |
2020-06-23 | [iiifdownloader] Only remove 1 duplicate page, as 2nd one may not be duplicat... | Nick White |
2020-06-23 | [iiifdownloader] Add support for BNF urls with a dot after book id | Nick White |
2020-06-23 | Add IIIF downloader, that just supports BNF for now | Nick White |
2020-06-01 | Mention documentation URL | Nick White |
2020-04-14 | Remove getbests; it belongs with bookpipeline (and putting it there removes a...v0.1.3 | Nick White |
2020-04-14 | Add godoc documentation | Nick White |
2020-03-13 | Update go.mod now that getbests util has a dependencyv0.1.2 | Nick White |
2020-03-13 | Add simple "getbests" utility, useful for statistics gathering | Nick White |
2020-03-13 | Add copyright statements to each file | Nick White |
2020-02-28 | Add license, copyright statements and a basic readmev0.1.1 | Nick White |
2020-02-27 | Add go.modv0.1.0 | Nick White |
2020-02-27 | Reorganise all commands to be behind cmd/ | Nick White |
2020-02-20 | [pare-gt] gofmt | Nick White |
2020-02-20 | [pare-gt] Fix sampling formula, make robust in the face of a 100% sample requ... | Nick White |
2020-02-20 | [pare-gt] Add some tests, and make deterministic | Nick White |
2020-02-20 | [pare-gt] gofmt | Nick White |
2020-02-19 | Split sampling functionality in pare-gt into a separate function that can be ... | Nick White |
2020-02-11 | Add pare-gt tool | Nick White |