summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-02-25Generalise get text from hocr linesNick White
2019-02-25Add tool to extract plain text from hocrNick White
2019-02-15Separate out binarize into a package, and start adding tests for itNick White
2019-01-30Set window size automatically based on resolutionNick White
2019-01-30Remove dependency on Imger packageNick White
2019-01-30Add integral image functionality to enable massive speedup of SauvolaNick White
Note that there are some very small differences to the output compared to the basic algorithm, but this doesn't make much difference. This is due to minor differences with the standard deviation calculation throughout, and with mean calculation at edges, for reasons I'm unclear about. WIP integral image speedup. mean is working Very WIP, but mean is perfect once full window is used Integral version all working! Remove debugging info Organise code better
2019-01-29Switch binarization to Sauvola algorithmNick White
2019-01-25Simplify writing of sort functions in line pkgNick White
2019-01-25gofmtNick White
2019-01-25Use consistent naming for .prob and .hocr OcrNameNick White
2019-01-25Add html output including all images, by writing them to an html directoryNick White
2019-01-25Rename line-conf-avg to avg-linesNick White
2019-01-25Rewrite line-conf-avg to use libraries, and support hocrNick White
2019-01-25Update location of librariesNick White
2019-01-25Add simple Otsu binarize tool (written a while ago)Nick White
2019-01-25Reorganisation and cleanupNick White
2019-01-24Fix bug: if non-prob/hocr file was encountered a dupe old line could be ↵Nick White
processed
2019-01-24Export hocr Parse() function as its likely to be useful elsewhereNick White
2019-01-24Allow the specs for buckets to be set using a json file in an argumentNick White
2019-01-24Better separation between library and toolNick White
2019-01-24Merge bucket-lines-{prob,hocr} into one tool called bucket-lines, that uses ↵Nick White
the filename extension to determine how to process the lines
2019-01-24Add -d to -hocr tool, and improve documentationNick White
2019-01-24Rename ocropus bucket tool, add -d option, and improve documentationNick White
2019-01-24Simplify .prob processingNick White
2019-01-24Get -tess tool to use generic bucket functions too. things are in pretty ↵Nick White
good shape now, just a few small todos left
2019-01-23Track and print bucket stats genericallyNick White
2019-01-23Create general BucketUp function, use it for line-conf-bucketsNick White
2019-01-23Update line-conf-buckets to mostly use package functions too.Nick White
Working now, but needs more consolidation to be worth it.
2019-01-23Separate out hocr parts from line partsNick White
2019-01-23Move image copying out to an interface function, so I can share code with ↵Nick White
line-conf-buckets easily and well
2019-01-23Start separating out functionality into separate package; working, but more ↵Nick White
to do, see TODO in hocr.go
2019-01-19Ensure files are closed as soon as they're finished withNick White
2019-01-19Add linebreaks after lines, and add TODONick White
2019-01-19line-conf-buckets-tess is done now. could do with cleaning up, but seems to ↵Nick White
be working well
2019-01-19wip line-conf-buckets-tess, saving lines is working, still stuff to doNick White
2019-01-03Add line-conf-buckets, to filter out and find potential ground-truthNick White
2019-01-03Sort lines by default, and add html outputNick White
2019-01-03Add basic working line confidence average tool (for ocropus .prob parsing)Nick White
2019-01-03Initial commitNick White