Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-12-13 | Hopefully fix empty training bug | Nick White | |
2019-12-13 | Mention training in ocr error message | Nick White | |
2019-12-13 | Print stdout and stderr output when tesseract fails | Nick White | |
2019-12-11 | Fix typo incorrectly screwing up PDFs | Nick White | |
2019-12-11 | Clarify use of -training in tools | Nick White | |
2019-12-11 | Clean up and correct book name parsing in the pipeline, and update usage of ↵ | Nick White | |
getpipelinebook | |||
2019-12-11 | Add ability to set a different training for the ocr job | Nick White | |
2019-12-06 | Don't abort PDF generation if pages aren't found, just do the best that can ↵ | Nick White | |
be done and move on; not all books will have all page types (such as wipeonly books) | |||
2019-12-05 | Fix the PDF in analyse step part of bookpipeline | Nick White | |
2019-12-05 | Add pdf generation to analyse step (untested) | Nick White | |
2019-12-03 | Don't pause between OCR page jobs; this should save us significant amounts ↵ | Nick White | |
of time when there are large numbers of pages | |||
2019-11-19 | Send pages to the individual OCR Page queue by default | Nick White | |
This now concludes the OCR Page queue stuff; it should all be working out of the box now. | |||
2019-11-19 | Add ocrpage queue for processing individual pages | Nick White | |
This should be a good way to get around the ongoing heartbeat issue, as individual page jobs will never come close to a the 12 hour mark that can cause the bug. The OCR page processing is done and working now, still to do is to populate the queue (rather than the ocr queue) after preprocessing / wiping. | |||
2019-10-29 | Print heartbeat error on failure | Nick White | |
2019-10-29 | Debugging: kill process immediately a heartbeat error is detected (systemd ↵ | Nick White | |
will restart it soon thereafter) | |||
2019-10-28 | Try to fix heartbeat renew issue more fully | Nick White | |
This approach first sets the remaining visibility timeout to zero. This should ensure that the message is available to re-find as soon as the process looks for it. Correspondingly the delay between checks is much shorter, as there shouldn't be a reason for much delay. | |||
2019-10-09 | Match prebinarised presegmented output from ocropus in wipepattern (named ↵ | Nick White | |
like "010001.bin.png") | |||
2019-10-08 | Update paths of other rescribe imports | Nick White | |
2019-10-08 | Separate out bookpipeline from catch-all go.git repo, and rename to ↵ | Nick White | |
rescribe.xyz/bookpipeline The dependencies from the go.git repo will follow in due course. |