summaryrefslogtreecommitdiff
path: root/cmd/bookpipeline
AgeCommit message (Collapse)Author
2019-12-13Mention training in ocr error messageNick White
2019-12-13Print stdout and stderr output when tesseract failsNick White
2019-12-11Fix typo incorrectly screwing up PDFsNick White
2019-12-11Clarify use of -training in toolsNick White
2019-12-11Clean up and correct book name parsing in the pipeline, and update usage of ↵Nick White
getpipelinebook
2019-12-11Add ability to set a different training for the ocr jobNick White
2019-12-06Don't abort PDF generation if pages aren't found, just do the best that can ↵Nick White
be done and move on; not all books will have all page types (such as wipeonly books)
2019-12-05Fix the PDF in analyse step part of bookpipelineNick White
2019-12-05Add pdf generation to analyse step (untested)Nick White
2019-12-03Don't pause between OCR page jobs; this should save us significant amounts ↵Nick White
of time when there are large numbers of pages
2019-11-19Send pages to the individual OCR Page queue by defaultNick White
This now concludes the OCR Page queue stuff; it should all be working out of the box now.
2019-11-19Add ocrpage queue for processing individual pagesNick White
This should be a good way to get around the ongoing heartbeat issue, as individual page jobs will never come close to a the 12 hour mark that can cause the bug. The OCR page processing is done and working now, still to do is to populate the queue (rather than the ocr queue) after preprocessing / wiping.
2019-10-29Print heartbeat error on failureNick White
2019-10-29Debugging: kill process immediately a heartbeat error is detected (systemd ↵Nick White
will restart it soon thereafter)
2019-10-28Try to fix heartbeat renew issue more fullyNick White
This approach first sets the remaining visibility timeout to zero. This should ensure that the message is available to re-find as soon as the process looks for it. Correspondingly the delay between checks is much shorter, as there shouldn't be a reason for much delay.
2019-10-09Match prebinarised presegmented output from ocropus in wipepattern (named ↵Nick White
like "010001.bin.png")
2019-10-08Update paths of other rescribe importsNick White
2019-10-08Separate out bookpipeline from catch-all go.git repo, and rename to ↵Nick White
rescribe.xyz/bookpipeline The dependencies from the go.git repo will follow in due course.