| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 2019-08-28 | Move booktopipeline and mkpipeline into bookpipeline/cmd | Nick White | |
| 2019-08-28 | Split out bookpipeline to cmd/ | Nick White | |
| 2019-08-28 | Move graph function to its own file, and further improve layout | Nick White | |
| 2019-08-28 | Separate graph creation from analyse(). | Nick White | |
| 2019-08-27 | Print x axis ticks nicely | Nick White | |
| 2019-08-27 | Add annotations for pages with confidence below 70 | Nick White | |
| 2019-08-27 | Add basic graphing (still work to do, but basics are working) | Nick White | |
| 2019-08-27 | Add basic analyse step, working but incomplete | Nick White | |
| 2019-08-23 | Expect source files to be .jpg | Nick White | |
| 2019-08-23 | Fix gaping bugs by using correct queues and downloads | Nick White | |
| This has involved refactoring to make the interface simpler, and just use the URLs / IDs for the necessary queues and storage locations, rather than wrap these in functions. | |||
| 2019-08-22 | Generalise preprocessing and ocring to reuse common code | Nick White | |
| 2019-08-22 | Switch to using flag to process command line, and allow different training ↵ | Nick White | |
| to be passed | |||
| 2019-08-22 | gofmt | Nick White | |
| 2019-08-22 | Update usage string, and comments | Nick White | |
| 2019-08-22 | Improve timing of queue checks | Nick White | |
| Now each queue is checked every 3 minutes, though the channel for each queue check request won't be rechecked until any previous job is completed. | |||
| 2019-08-22 | Fix process finishing by closing dl channel | Nick White | |
| 2019-08-20 | Handle errors properly with goroutines | Nick White | |
| 2019-08-20 | Handle errors correctly in main parts of program | Nick White | |
| 2019-08-20 | Substantially improve problematic object listing part of API | Nick White | |
| Switch to regular non-concurrent stuff, concurrency is better handled by the main program anyway. Now we handle errors properly, and things are way simpler. | |||
| 2019-08-20 | Add basic OCR support, and reorganise code | Nick White | |
| The previously committed thing didn't work, as listobjects was sending to a channel synchronously, so it was never being received. The current API isn't great, mixing synchronous and non-synchronous things, not handling errors consistently, and generally is over complicated. That will be fixed soon. | |||
| 2019-08-20 | Split aws implementation from main.go in pipelinepreprocess | Nick White | |
| 2019-08-20 | Export qmsg type | Nick White | |
| 2019-08-19 | Fix pipelinepreprocess segfaults | Nick White | |
| These were caused by using non-pointer methods, which meant that the values set in Init() were not saved. | |||
| 2019-08-19 | Work in progress rearchitecture to use interfaces; currently pointers are ↵ | Nick White | |
| screwy causing segfaults | |||
| 2019-08-13 | Various improvements to pipelinepreprocess | Nick White | |
| - Ensure temporary directory already being present isn't an issue - Remove temporary directory when done with it - Ensure any already preprocessed files aren't preprocessed themselves (this could happen in the case of a run stopping half way through) | |||
| 2019-08-13 | Correct typo in bucket name for pipelinepreprocess; tested and seems to ↵ | Nick White | |
| work, remarkably | |||
| 2019-08-13 | Add bonus verbose log points | Nick White | |
| 2019-08-13 | Add booktopipeline tool (only lightly tested) | Nick White | |
| 2019-08-13 | Reduce SQS WaitTime to something in-spec, and add bonus verbose log points | Nick White | |
| 2019-08-13 | Switch ksizes to use by preprocmulti | Nick White | |
| 2019-08-13 | Add basic verbose logging capabilities to pipelinepreprocess | Nick White | |
| 2019-07-25 | Add first draft of pipelinepreprocess - completely untested, will contain bugs | Nick White | |
| 2019-07-19 | rename setupawspipeline to mkpipeline | Nick White | |
| 2019-07-19 | rename pipelineaws to setupawspipeline | Nick White | |
| 2019-07-19 | Add aws pipeline setup | Nick White | |
| 2019-06-25 | Remove 0.6 binarisation threshold option from preprocmulti | Nick White | |
| 2019-06-25 | Experimentally adjust wipe threshold according to binarisation level | Nick White | |
| 2019-06-11 | Name hocrs as pdfimages does, and preserve entities for hocr | Nick White | |
| 2019-06-11 | Add basic utility to turn an eebo xml into a set of hocr files (for hocr2pdf) | Nick White | |
| 2019-06-03 | Add option to disable wiping for preproc and preprocmulti | Nick White | |
| 2019-06-03 | Add -m option to wipe to set minimum content area for wipe to proceed | Nick White | |
| If content is very light or sparse it may be better to not wipe at all than wipe almost all of the content leaving a small strip. This is done now by aborting the wipe if the detected content takes up less than the minimum % of the page (default is 30%). | |||
| 2019-05-15 | Return an error if page average calculation cant be done with hocr | Nick White | |
| 2019-05-14 | Rewrite pgconf to be more accurate by measuring average word confidence ↵ | Nick White | |
| rather than average line confidence | |||
| 2019-05-14 | pgconf: Don't print NaN if a page has no lines, and show the percentage, ↵ | Nick White | |
| rather than float, for easier comparison | |||
| 2019-05-14 | Add pgconf tool that prints the overall confidence for a whole page of hocr | Nick White | |
| 2019-05-14 | Basic cleanup of preprocmulti | Nick White | |
| 2019-05-14 | gofmt | Nick White | |
| 2019-05-14 | Add preprocmulti tool, that outputs multiple binarisation options quickly | Nick White | |
| 2019-05-13 | Add preproc command, that binarises and preprocesses together | Nick White | |
| Surprisingly opening an image takes a significant amount of the total processing time, so this actually saves quite a bit of time in the grand scheme of things. | |||
| 2019-05-13 | Define flags in each test, so they arent erroneously picked up and used by ↵ | Nick White | |
| cmds as they were defined in global package space | |||
