summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-13Improve error handling, and ensure incomplete page downloads are removedNick White
2020-10-12[analysestats] skip zero confidence pages from statsNick White
2020-09-28[iiifdownloader] Add a TODO to switch to tile based downloadingNick White
2020-09-28[iiifdownloader] Work around oxford needing the iiif suffix adding to its idNick White
2020-09-28[iiifdownloader] Default to iiifmanifest type if none is given and no ↵Nick White
definitive service can be found
2020-09-28Make page numbering more generic to handle more iiif variety, and add ↵Nick White
harvardartmuseums iiif manifest example url
2020-09-28Add ability to pass -service to choose which download type to use, plus add ↵Nick White
a -bookdir flag to set download directory
2020-09-22[analysestats] completeNick White
2020-09-22[analysestats] Parse hocr for training usedNick White
2020-09-21Add wip analysestats commandNick White
2020-09-21Use strings.Replace rather than strings.ReplaceAll so that it works on older ↵Nick White
versions of go
2020-09-12Add todosNick White
2020-09-08Add the option to force METS usage for BSBNick White
2020-09-08Switch to using generic page downloader for BNFNick White
2020-09-08Improve urlToPgName so it can be used by BNF tooNick White
2020-09-08Improve urlToPgName and documentationNick White
2020-09-08Sanitise URLs so that // in url doesn't cause issues (bsb site can spew these)Nick White
2020-09-08Switch from METS to IIIF manifest for BSB downloading, as it returns higher ↵Nick White
quality images with no visible watermark
2020-09-08[iiifdownloader] BSB downloading works now, by parsing METS XMLNick White
2020-09-07[iiifdownloader] Split out NoPgNums downloading to its own functionNick White
2020-09-07Add skeleton of bsb supportNick White
2020-08-25Move dehyphenate string code into its own functionNick White
2020-08-25Fixes to dehyphenateNick White
- Ensure a final hyphen on the last word of a page isn't removed - Only try to add a next word if there is one to take - Ensure that if a single word is on the following line, which is taken, then the line is blanked
2020-08-25Add text mode for dehyphenate toolNick White
2020-06-23[iiifdownloader] Only remove 1 duplicate page, as 2nd one may not be ↵Nick White
duplicate (no way of knowing as if it is its downsized)
2020-06-23[iiifdownloader] Add support for BNF urls with a dot after book idNick White
2020-06-23Add IIIF downloader, that just supports BNF for nowNick White
2020-06-01Mention documentation URLNick White
2020-04-14Remove getbests; it belongs with bookpipeline (and putting it there removes ↵v0.1.3Nick White
an annoying circular dependency)
2020-04-14Add godoc documentationNick White
2020-03-13Update go.mod now that getbests util has a dependencyv0.1.2Nick White
2020-03-13Add simple "getbests" utility, useful for statistics gatheringNick White
2020-03-13Add copyright statements to each fileNick White
2020-02-28Add license, copyright statements and a basic readmev0.1.1Nick White
2020-02-27Add go.modv0.1.0Nick White
2020-02-27Reorganise all commands to be behind cmd/Nick White
2020-02-20[pare-gt] gofmtNick White
2020-02-20[pare-gt] Fix sampling formula, make robust in the face of a 100% sample ↵Nick White
request, and fix up test output
2020-02-20[pare-gt] Add some tests, and make deterministicNick White
These tests have uncovered at least 2 bugs that haven't yet been squashed: - 1% selection hangs - 20% selection only takes as many as 10%
2020-02-20[pare-gt] gofmtNick White
2020-02-19Split sampling functionality in pare-gt into a separate function that can be ↵Nick White
tested (coming soon)
2020-02-11Add pare-gt toolNick White
2020-01-22Fix up boxtotxt toolNick White
2020-01-22Add GetWordConfs function to hocr pkgNick White
2020-01-22Add simple boxtotxt toolNick White
2019-11-12Clean up, and add comment explaining design choice to fonttobytesNick White
2019-11-12Add fonttobytes, to embed the font into pdf tools in due courseNick White
2019-10-31Export a couple of more generally useful functionsNick White
2019-10-30Simplify and document hocr package slightly betterNick White
2019-10-23Add tiny doc.go, hopefully will ensure 'go get rescribe.xyz/utils' doesn't ↵Nick White
return an error for lack of .go files