Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-08-22 | Switch to using flag to process command line, and allow different training ↵ | Nick White | |
to be passed | |||
2019-08-22 | gofmt | Nick White | |
2019-08-22 | Update usage string, and comments | Nick White | |
2019-08-22 | Improve timing of queue checks | Nick White | |
Now each queue is checked every 3 minutes, though the channel for each queue check request won't be rechecked until any previous job is completed. | |||
2019-08-22 | Fix process finishing by closing dl channel | Nick White | |
2019-08-20 | Handle errors properly with goroutines | Nick White | |
2019-08-20 | Handle errors correctly in main parts of program | Nick White | |
2019-08-20 | Substantially improve problematic object listing part of API | Nick White | |
Switch to regular non-concurrent stuff, concurrency is better handled by the main program anyway. Now we handle errors properly, and things are way simpler. | |||
2019-08-20 | Add basic OCR support, and reorganise code | Nick White | |
The previously committed thing didn't work, as listobjects was sending to a channel synchronously, so it was never being received. The current API isn't great, mixing synchronous and non-synchronous things, not handling errors consistently, and generally is over complicated. That will be fixed soon. | |||
2019-08-20 | Split aws implementation from main.go in pipelinepreprocess | Nick White | |
2019-08-20 | Export qmsg type | Nick White | |
2019-08-19 | Fix pipelinepreprocess segfaults | Nick White | |
These were caused by using non-pointer methods, which meant that the values set in Init() were not saved. | |||
2019-08-19 | Work in progress rearchitecture to use interfaces; currently pointers are ↵ | Nick White | |
screwy causing segfaults | |||
2019-08-13 | Various improvements to pipelinepreprocess | Nick White | |
- Ensure temporary directory already being present isn't an issue - Remove temporary directory when done with it - Ensure any already preprocessed files aren't preprocessed themselves (this could happen in the case of a run stopping half way through) | |||
2019-08-13 | Correct typo in bucket name for pipelinepreprocess; tested and seems to ↵ | Nick White | |
work, remarkably | |||
2019-08-13 | Add bonus verbose log points | Nick White | |
2019-08-13 | Add booktopipeline tool (only lightly tested) | Nick White | |
2019-08-13 | Reduce SQS WaitTime to something in-spec, and add bonus verbose log points | Nick White | |
2019-08-13 | Switch ksizes to use by preprocmulti | Nick White | |
2019-08-13 | Add basic verbose logging capabilities to pipelinepreprocess | Nick White | |
2019-07-25 | Add first draft of pipelinepreprocess - completely untested, will contain bugs | Nick White | |
2019-07-19 | rename setupawspipeline to mkpipeline | Nick White | |
2019-07-19 | rename pipelineaws to setupawspipeline | Nick White | |
2019-07-19 | Add aws pipeline setup | Nick White | |
2019-06-25 | Remove 0.6 binarisation threshold option from preprocmulti | Nick White | |
2019-06-25 | Experimentally adjust wipe threshold according to binarisation level | Nick White | |
2019-06-11 | Name hocrs as pdfimages does, and preserve entities for hocr | Nick White | |
2019-06-11 | Add basic utility to turn an eebo xml into a set of hocr files (for hocr2pdf) | Nick White | |
2019-06-03 | Add option to disable wiping for preproc and preprocmulti | Nick White | |
2019-06-03 | Add -m option to wipe to set minimum content area for wipe to proceed | Nick White | |
If content is very light or sparse it may be better to not wipe at all than wipe almost all of the content leaving a small strip. This is done now by aborting the wipe if the detected content takes up less than the minimum % of the page (default is 30%). | |||
2019-05-15 | Return an error if page average calculation cant be done with hocr | Nick White | |
2019-05-14 | Rewrite pgconf to be more accurate by measuring average word confidence ↵ | Nick White | |
rather than average line confidence | |||
2019-05-14 | pgconf: Don't print NaN if a page has no lines, and show the percentage, ↵ | Nick White | |
rather than float, for easier comparison | |||
2019-05-14 | Add pgconf tool that prints the overall confidence for a whole page of hocr | Nick White | |
2019-05-14 | Basic cleanup of preprocmulti | Nick White | |
2019-05-14 | gofmt | Nick White | |
2019-05-14 | Add preprocmulti tool, that outputs multiple binarisation options quickly | Nick White | |
2019-05-13 | Add preproc command, that binarises and preprocesses together | Nick White | |
Surprisingly opening an image takes a significant amount of the total processing time, so this actually saves quite a bit of time in the grand scheme of things. | |||
2019-05-13 | Define flags in each test, so they arent erroneously picked up and used by ↵ | Nick White | |
cmds as they were defined in global package space | |||
2019-05-13 | Use general integralimg functions for wipe functions | Nick White | |
2019-05-13 | Add -slow flag to test to skip slow tests by default | Nick White | |
2019-05-13 | Reorganise image manipulation to separate integral image parts | Nick White | |
Also unify everything else under preproc/ Note that the UsefulImg interface should be used by the main functions, to simplify things, but this hasn't been done yet. | |||
2019-05-13 | Start switching preproc to use interfaces more | Nick White | |
2019-05-13 | Rename cleanup to wipe, and only export main function | Nick White | |
2019-05-13 | Rename cleanup package to preproc, and add basic cmd version | Nick White | |
2019-05-13 | Improve error handling in sauvola tests | Nick White | |
2019-05-13 | Make cleanup a basic library | Nick White | |
2019-05-13 | Add some basic tests for cleanup | Nick White | |
2019-05-13 | Use the simplified findbestedge function, and simplify code | Nick White | |
2019-04-18 | Simplify cleanup code | Nick White | |