Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-09-04 | Rewrite heartbeat so errors during it will be reported, and the aws api ↵ | Nick White | |
doesn't rely on channels | |||
2019-09-04 | Ensure any channels that need to be consumed before goroutine is finished ↵ | Nick White | |
are done in the case of an error | |||
2019-09-03 | Improve debug logging | Nick White | |
2019-09-02 | Log upload and download events | Nick White | |
2019-09-02 | Add initial getpipelinebook cmd (untested) | Nick White | |
2019-08-28 | Add medium and bad lines to graphs | Nick White | |
2019-08-28 | Add standalone graph tool; confgraph | Nick White | |
2019-08-28 | Move booktopipeline and mkpipeline into bookpipeline/cmd | Nick White | |
2019-08-28 | Split out bookpipeline to cmd/ | Nick White | |
2019-08-28 | Move graph function to its own file, and further improve layout | Nick White | |
2019-08-28 | Separate graph creation from analyse(). | Nick White | |
2019-08-27 | Print x axis ticks nicely | Nick White | |
2019-08-27 | Add annotations for pages with confidence below 70 | Nick White | |
2019-08-27 | Add basic graphing (still work to do, but basics are working) | Nick White | |
2019-08-27 | Add basic analyse step, working but incomplete | Nick White | |
2019-08-23 | Expect source files to be .jpg | Nick White | |
2019-08-23 | Fix gaping bugs by using correct queues and downloads | Nick White | |
This has involved refactoring to make the interface simpler, and just use the URLs / IDs for the necessary queues and storage locations, rather than wrap these in functions. | |||
2019-08-22 | Generalise preprocessing and ocring to reuse common code | Nick White | |
2019-08-22 | Switch to using flag to process command line, and allow different training ↵ | Nick White | |
to be passed | |||
2019-08-22 | gofmt | Nick White | |
2019-08-22 | Update usage string, and comments | Nick White | |
2019-08-22 | Improve timing of queue checks | Nick White | |
Now each queue is checked every 3 minutes, though the channel for each queue check request won't be rechecked until any previous job is completed. | |||
2019-08-22 | Fix process finishing by closing dl channel | Nick White | |
2019-08-20 | Handle errors properly with goroutines | Nick White | |
2019-08-20 | Handle errors correctly in main parts of program | Nick White | |
2019-08-20 | Substantially improve problematic object listing part of API | Nick White | |
Switch to regular non-concurrent stuff, concurrency is better handled by the main program anyway. Now we handle errors properly, and things are way simpler. | |||
2019-08-20 | Add basic OCR support, and reorganise code | Nick White | |
The previously committed thing didn't work, as listobjects was sending to a channel synchronously, so it was never being received. The current API isn't great, mixing synchronous and non-synchronous things, not handling errors consistently, and generally is over complicated. That will be fixed soon. | |||
2019-08-20 | Split aws implementation from main.go in pipelinepreprocess | Nick White | |
2019-08-20 | Export qmsg type | Nick White | |
2019-08-19 | Fix pipelinepreprocess segfaults | Nick White | |
These were caused by using non-pointer methods, which meant that the values set in Init() were not saved. | |||
2019-08-19 | Work in progress rearchitecture to use interfaces; currently pointers are ↵ | Nick White | |
screwy causing segfaults | |||
2019-08-13 | Various improvements to pipelinepreprocess | Nick White | |
- Ensure temporary directory already being present isn't an issue - Remove temporary directory when done with it - Ensure any already preprocessed files aren't preprocessed themselves (this could happen in the case of a run stopping half way through) | |||
2019-08-13 | Correct typo in bucket name for pipelinepreprocess; tested and seems to ↵ | Nick White | |
work, remarkably | |||
2019-08-13 | Add bonus verbose log points | Nick White | |
2019-08-13 | Add booktopipeline tool (only lightly tested) | Nick White | |
2019-08-13 | Reduce SQS WaitTime to something in-spec, and add bonus verbose log points | Nick White | |
2019-08-13 | Switch ksizes to use by preprocmulti | Nick White | |
2019-08-13 | Add basic verbose logging capabilities to pipelinepreprocess | Nick White | |
2019-07-25 | Add first draft of pipelinepreprocess - completely untested, will contain bugs | Nick White | |
2019-07-19 | rename setupawspipeline to mkpipeline | Nick White | |
2019-07-19 | rename pipelineaws to setupawspipeline | Nick White | |
2019-07-19 | Add aws pipeline setup | Nick White | |
2019-06-25 | Remove 0.6 binarisation threshold option from preprocmulti | Nick White | |
2019-06-25 | Experimentally adjust wipe threshold according to binarisation level | Nick White | |
2019-06-11 | Name hocrs as pdfimages does, and preserve entities for hocr | Nick White | |
2019-06-11 | Add basic utility to turn an eebo xml into a set of hocr files (for hocr2pdf) | Nick White | |
2019-06-03 | Add option to disable wiping for preproc and preprocmulti | Nick White | |
2019-06-03 | Add -m option to wipe to set minimum content area for wipe to proceed | Nick White | |
If content is very light or sparse it may be better to not wipe at all than wipe almost all of the content leaving a small strip. This is done now by aborting the wipe if the detected content takes up less than the minimum % of the page (default is 30%). | |||
2019-05-15 | Return an error if page average calculation cant be done with hocr | Nick White | |
2019-05-14 | Rewrite pgconf to be more accurate by measuring average word confidence ↵ | Nick White | |
rather than average line confidence |