Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-09-18 | Add start of lspipeline | Nick White | |
2019-09-17 | gofmt | Nick White | |
2019-09-16 | Be more careful to try to grab the message after a heartbeat failure more ↵ | Nick White | |
quickly Rather than waiting for the whole length of a visibility timeout, in which time another process may grab the message, instead wait a short amount of time, each time the message is searched for. Also add a bit more logging. | |||
2019-09-14 | Ensure enough time has elapsed before looking for the message to reget in ↵ | Nick White | |
the case of heartbeat running out | |||
2019-09-12 | Don't prefix date/time to logs, as logger will store that anyway | Nick White | |
2019-09-11 | Increase size of graph to 4k | Nick White | |
2019-09-11 | Fix bug with graph that prevented the ticks from being correct, thus ruining ↵ | Nick White | |
the graph | |||
2019-09-11 | Work around the SQS limit of 12 hours of visibility timeout | Nick White | |
This is done by checking for the error that is emitted in such a case, and if it's found trying several times to find the message back in the queue, and returning the message with an updated handle back to the caller to use in the future. | |||
2019-09-06 | Add flags to disable checking various queues | Nick White | |
2019-09-05 | Handle no words found error in a better way so any page that is actually 0 ↵ | Nick White | |
confidence is recognised | |||
2019-09-05 | Don't abort analysis if we encounter a hocr with no words, just skip it | Nick White | |
2019-09-05 | gofmt | Nick White | |
2019-09-05 | Update Pipeliner interface in getpipelinebook, and update some comments | Nick White | |
2019-09-04 | Rewrite heartbeat so errors during it will be reported, and the aws api ↵ | Nick White | |
doesn't rely on channels | |||
2019-09-04 | Ensure any channels that need to be consumed before goroutine is finished ↵ | Nick White | |
are done in the case of an error | |||
2019-09-03 | Improve debug logging | Nick White | |
2019-09-02 | Log upload and download events | Nick White | |
2019-09-02 | Add initial getpipelinebook cmd (untested) | Nick White | |
2019-08-28 | Add medium and bad lines to graphs | Nick White | |
2019-08-28 | Add standalone graph tool; confgraph | Nick White | |
2019-08-28 | Move booktopipeline and mkpipeline into bookpipeline/cmd | Nick White | |
2019-08-28 | Split out bookpipeline to cmd/ | Nick White | |
2019-08-28 | Move graph function to its own file, and further improve layout | Nick White | |
2019-08-28 | Separate graph creation from analyse(). | Nick White | |
2019-08-27 | Print x axis ticks nicely | Nick White | |
2019-08-27 | Add annotations for pages with confidence below 70 | Nick White | |
2019-08-27 | Add basic graphing (still work to do, but basics are working) | Nick White | |
2019-08-27 | Add basic analyse step, working but incomplete | Nick White | |
2019-08-23 | Expect source files to be .jpg | Nick White | |
2019-08-23 | Fix gaping bugs by using correct queues and downloads | Nick White | |
This has involved refactoring to make the interface simpler, and just use the URLs / IDs for the necessary queues and storage locations, rather than wrap these in functions. | |||
2019-08-22 | Generalise preprocessing and ocring to reuse common code | Nick White | |
2019-08-22 | Switch to using flag to process command line, and allow different training ↵ | Nick White | |
to be passed | |||
2019-08-22 | gofmt | Nick White | |
2019-08-22 | Update usage string, and comments | Nick White | |
2019-08-22 | Improve timing of queue checks | Nick White | |
Now each queue is checked every 3 minutes, though the channel for each queue check request won't be rechecked until any previous job is completed. | |||
2019-08-22 | Fix process finishing by closing dl channel | Nick White | |
2019-08-20 | Handle errors properly with goroutines | Nick White | |
2019-08-20 | Handle errors correctly in main parts of program | Nick White | |
2019-08-20 | Substantially improve problematic object listing part of API | Nick White | |
Switch to regular non-concurrent stuff, concurrency is better handled by the main program anyway. Now we handle errors properly, and things are way simpler. | |||
2019-08-20 | Add basic OCR support, and reorganise code | Nick White | |
The previously committed thing didn't work, as listobjects was sending to a channel synchronously, so it was never being received. The current API isn't great, mixing synchronous and non-synchronous things, not handling errors consistently, and generally is over complicated. That will be fixed soon. | |||
2019-08-20 | Split aws implementation from main.go in pipelinepreprocess | Nick White | |
2019-08-20 | Export qmsg type | Nick White | |
2019-08-19 | Fix pipelinepreprocess segfaults | Nick White | |
These were caused by using non-pointer methods, which meant that the values set in Init() were not saved. | |||
2019-08-19 | Work in progress rearchitecture to use interfaces; currently pointers are ↵ | Nick White | |
screwy causing segfaults | |||
2019-08-13 | Various improvements to pipelinepreprocess | Nick White | |
- Ensure temporary directory already being present isn't an issue - Remove temporary directory when done with it - Ensure any already preprocessed files aren't preprocessed themselves (this could happen in the case of a run stopping half way through) | |||
2019-08-13 | Correct typo in bucket name for pipelinepreprocess; tested and seems to ↵ | Nick White | |
work, remarkably | |||
2019-08-13 | Add bonus verbose log points | Nick White | |
2019-08-13 | Add booktopipeline tool (only lightly tested) | Nick White | |
2019-08-13 | Reduce SQS WaitTime to something in-spec, and add bonus verbose log points | Nick White | |
2019-08-13 | Switch ksizes to use by preprocmulti | Nick White | |