summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2019-06-25Experimentally adjust wipe threshold according to binarisation levelNick White
2019-06-11Name hocrs as pdfimages does, and preserve entities for hocrNick White
2019-06-11Add basic utility to turn an eebo xml into a set of hocr files (for hocr2pdf)Nick White
2019-06-03Add option to disable wiping for preproc and preprocmultiNick White
2019-06-03Add -m option to wipe to set minimum content area for wipe to proceedNick White
2019-05-15Return an error if page average calculation cant be done with hocrNick White
2019-05-14Rewrite pgconf to be more accurate by measuring average word confidence rathe...Nick White
2019-05-14pgconf: Don't print NaN if a page has no lines, and show the percentage, rath...Nick White
2019-05-14Add pgconf tool that prints the overall confidence for a whole page of hocrNick White
2019-05-14Basic cleanup of preprocmultiNick White
2019-05-14gofmtNick White
2019-05-14Add preprocmulti tool, that outputs multiple binarisation options quicklyNick White
2019-05-13Add preproc command, that binarises and preprocesses togetherNick White
2019-05-13Define flags in each test, so they arent erroneously picked up and used by cm...Nick White
2019-05-13Use general integralimg functions for wipe functionsNick White
2019-05-13Add -slow flag to test to skip slow tests by defaultNick White
2019-05-13Reorganise image manipulation to separate integral image partsNick White
2019-05-13Start switching preproc to use interfaces moreNick White
2019-05-13Rename cleanup to wipe, and only export main functionNick White
2019-05-13Rename cleanup package to preproc, and add basic cmd versionNick White
2019-05-13Improve error handling in sauvola testsNick White
2019-05-13Make cleanup a basic libraryNick White
2019-05-13Add some basic tests for cleanupNick White
2019-05-13Use the simplified findbestedge function, and simplify codeNick White
2019-04-18Simplify cleanup codeNick White
2019-04-18Put edge in middle of window slice, rather than at left side, and gofmtNick White
2019-04-18Add basic cleanup tool; working, but more refinements planned.Nick White
2019-04-17Add basic dehyphenate toolNick White
2019-03-28Remove todo for integral image testing for nowNick White
2019-03-28Improve tests; test regular sauvola, and add option to update golden filesNick White
2019-03-26Add zeroinv option for binarize commandNick White
2019-03-26Move sauvola binarization tool to cmd/binarizeNick White
2019-03-26Better error handling with hocr linesNick White
2019-02-25Generalise get text from hocr linesNick White
2019-02-25Add tool to extract plain text from hocrNick White
2019-02-15Separate out binarize into a package, and start adding tests for itNick White
2019-01-30Set window size automatically based on resolutionNick White
2019-01-30Remove dependency on Imger packageNick White
2019-01-30Add integral image functionality to enable massive speedup of SauvolaNick White
2019-01-29Switch binarization to Sauvola algorithmNick White
2019-01-25Simplify writing of sort functions in line pkgNick White
2019-01-25gofmtNick White
2019-01-25Use consistent naming for .prob and .hocr OcrNameNick White
2019-01-25Add html output including all images, by writing them to an html directoryNick White
2019-01-25Rename line-conf-avg to avg-linesNick White
2019-01-25Rewrite line-conf-avg to use libraries, and support hocrNick White
2019-01-25Update location of librariesNick White
2019-01-25Add simple Otsu binarize tool (written a while ago)Nick White
2019-01-25Reorganisation and cleanupNick White
2019-01-24Fix bug: if non-prob/hocr file was encountered a dupe old line could be proce...Nick White