summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-08Improve urlToPgName and documentationNick White
2020-09-08Sanitise URLs so that // in url doesn't cause issues (bsb site can spew these)Nick White
2020-09-08Switch from METS to IIIF manifest for BSB downloading, as it returns higher ↵Nick White
quality images with no visible watermark
2020-09-08[iiifdownloader] BSB downloading works now, by parsing METS XMLNick White
2020-09-07[iiifdownloader] Split out NoPgNums downloading to its own functionNick White
2020-09-07Add skeleton of bsb supportNick White
2020-08-25Move dehyphenate string code into its own functionNick White
2020-08-25Fixes to dehyphenateNick White
- Ensure a final hyphen on the last word of a page isn't removed - Only try to add a next word if there is one to take - Ensure that if a single word is on the following line, which is taken, then the line is blanked
2020-08-25Add text mode for dehyphenate toolNick White
2020-06-23[iiifdownloader] Only remove 1 duplicate page, as 2nd one may not be ↵Nick White
duplicate (no way of knowing as if it is its downsized)
2020-06-23[iiifdownloader] Add support for BNF urls with a dot after book idNick White
2020-06-23Add IIIF downloader, that just supports BNF for nowNick White
2020-06-01Mention documentation URLNick White
2020-04-14Remove getbests; it belongs with bookpipeline (and putting it there removes ↵v0.1.3Nick White
an annoying circular dependency)
2020-04-14Add godoc documentationNick White
2020-03-13Update go.mod now that getbests util has a dependencyv0.1.2Nick White
2020-03-13Add simple "getbests" utility, useful for statistics gatheringNick White
2020-03-13Add copyright statements to each fileNick White
2020-02-28Add license, copyright statements and a basic readmev0.1.1Nick White
2020-02-27Add go.modv0.1.0Nick White
2020-02-27Reorganise all commands to be behind cmd/Nick White
2020-02-20[pare-gt] gofmtNick White
2020-02-20[pare-gt] Fix sampling formula, make robust in the face of a 100% sample ↵Nick White
request, and fix up test output
2020-02-20[pare-gt] Add some tests, and make deterministicNick White
These tests have uncovered at least 2 bugs that haven't yet been squashed: - 1% selection hangs - 20% selection only takes as many as 10%
2020-02-20[pare-gt] gofmtNick White
2020-02-19Split sampling functionality in pare-gt into a separate function that can be ↵Nick White
tested (coming soon)
2020-02-11Add pare-gt toolNick White
2020-01-22Fix up boxtotxt toolNick White
2020-01-22Add GetWordConfs function to hocr pkgNick White
2020-01-22Add simple boxtotxt toolNick White
2019-11-12Clean up, and add comment explaining design choice to fonttobytesNick White
2019-11-12Add fonttobytes, to embed the font into pdf tools in due courseNick White
2019-10-31Export a couple of more generally useful functionsNick White
2019-10-30Simplify and document hocr package slightly betterNick White
2019-10-23Add tiny doc.go, hopefully will ensure 'go get rescribe.xyz/utils' doesn't ↵Nick White
return an error for lack of .go files
2019-10-23Make bucket-lines and related packages more robustNick White
bucket-lines would crash for any line that didn't have a corresponding image. Lines which weren't grayscale would also cause crashes; now they are just converted to grayscale if necessary. As a bonus, lines in jpeg can also be decoded successfull.
2019-10-08Remove parts that have been moved elsewhere, and rename to rescribe.xyz/utilsNick White
bookpipeline is now at rescribe.xyz/bookpipeline preproc is now at rescribe.xyz/preproc integralimg is now at rescribe.xyz/preproc/integralimg
2019-10-07Ensure wipe pipeline uses the expected png filesNick White
2019-10-02Improve usage notice for booktopipelineNick White
2019-10-02Add -prebinarised flag to booktopipelineNick White
2019-10-02gofmtNick White
2019-10-02Add wipeonly queue and functionalityNick White
This is useful for prebinarised images, which don't need full preprocessing, but do require wiping, albeit with a more conservative threshold.
2019-09-27Improve wiping procedure to work better with 2 column layoutsNick White
2019-09-27Fix crash bug when graph was used on source with less than 10 pagesNick White
2019-09-27One more update of graph.go to correspond to new go-chart, and improve usage ↵Nick White
for wipe
2019-09-27Hardcode to ignore "workhorse" from logsNick White
2019-09-27Update usage of go-chart to correspond to latest version of libraryNick White
2019-09-24gofmtNick White
2019-09-24Improve ssh logs; ensure only fully operational servers are tried, and ↵Nick White
ensure connections to new ips not in known_hosts still succeed
2019-09-24Do ssh log collection concurrentlyNick White