summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-09-05Handle no words found error in a better way so any page that is actually 0 ↵Nick White
confidence is recognised
2019-09-05Don't abort analysis if we encounter a hocr with no words, just skip itNick White
2019-09-05gofmtNick White
2019-09-05Update Pipeliner interface in getpipelinebook, and update some commentsNick White
2019-09-04Rewrite heartbeat so errors during it will be reported, and the aws api ↵Nick White
doesn't rely on channels
2019-09-04Ensure any channels that need to be consumed before goroutine is finished ↵Nick White
are done in the case of an error
2019-09-03Improve debug loggingNick White
2019-09-02Log upload and download eventsNick White
2019-09-02Add initial getpipelinebook cmd (untested)Nick White
2019-08-28Add medium and bad lines to graphsNick White
2019-08-28Add standalone graph tool; confgraphNick White
2019-08-28Move booktopipeline and mkpipeline into bookpipeline/cmdNick White
2019-08-28Split out bookpipeline to cmd/Nick White
2019-08-28Move graph function to its own file, and further improve layoutNick White
2019-08-28Separate graph creation from analyse().Nick White
2019-08-27Print x axis ticks nicelyNick White
2019-08-27Add annotations for pages with confidence below 70Nick White
2019-08-27Add basic graphing (still work to do, but basics are working)Nick White
2019-08-27Add basic analyse step, working but incompleteNick White
2019-08-23Expect source files to be .jpgNick White
2019-08-23Fix gaping bugs by using correct queues and downloadsNick White
This has involved refactoring to make the interface simpler, and just use the URLs / IDs for the necessary queues and storage locations, rather than wrap these in functions.
2019-08-22Generalise preprocessing and ocring to reuse common codeNick White
2019-08-22Switch to using flag to process command line, and allow different training ↵Nick White
to be passed
2019-08-22gofmtNick White
2019-08-22Update usage string, and commentsNick White
2019-08-22Improve timing of queue checksNick White
Now each queue is checked every 3 minutes, though the channel for each queue check request won't be rechecked until any previous job is completed.
2019-08-22Fix process finishing by closing dl channelNick White
2019-08-20Handle errors properly with goroutinesNick White
2019-08-20Handle errors correctly in main parts of programNick White
2019-08-20Substantially improve problematic object listing part of APINick White
Switch to regular non-concurrent stuff, concurrency is better handled by the main program anyway. Now we handle errors properly, and things are way simpler.
2019-08-20Add basic OCR support, and reorganise codeNick White
The previously committed thing didn't work, as listobjects was sending to a channel synchronously, so it was never being received. The current API isn't great, mixing synchronous and non-synchronous things, not handling errors consistently, and generally is over complicated. That will be fixed soon.
2019-08-20Split aws implementation from main.go in pipelinepreprocessNick White
2019-08-20Export qmsg typeNick White
2019-08-19Fix pipelinepreprocess segfaultsNick White
These were caused by using non-pointer methods, which meant that the values set in Init() were not saved.
2019-08-19Work in progress rearchitecture to use interfaces; currently pointers are ↵Nick White
screwy causing segfaults
2019-08-13Various improvements to pipelinepreprocessNick White
- Ensure temporary directory already being present isn't an issue - Remove temporary directory when done with it - Ensure any already preprocessed files aren't preprocessed themselves (this could happen in the case of a run stopping half way through)
2019-08-13Correct typo in bucket name for pipelinepreprocess; tested and seems to ↵Nick White
work, remarkably
2019-08-13Add bonus verbose log pointsNick White
2019-08-13Add booktopipeline tool (only lightly tested)Nick White
2019-08-13Reduce SQS WaitTime to something in-spec, and add bonus verbose log pointsNick White
2019-08-13Switch ksizes to use by preprocmultiNick White
2019-08-13Add basic verbose logging capabilities to pipelinepreprocessNick White
2019-07-25Add first draft of pipelinepreprocess - completely untested, will contain bugsNick White
2019-07-19rename setupawspipeline to mkpipelineNick White
2019-07-19rename pipelineaws to setupawspipelineNick White
2019-07-19Add aws pipeline setupNick White
2019-06-25Remove 0.6 binarisation threshold option from preprocmultiNick White
2019-06-25Experimentally adjust wipe threshold according to binarisation levelNick White
2019-06-11Name hocrs as pdfimages does, and preserve entities for hocrNick White
2019-06-11Add basic utility to turn an eebo xml into a set of hocr files (for hocr2pdf)Nick White