bookpipeline - Tools to process books in a cloud based pipeline system

Age	Commit message (Collapse)	Author
2019-12-13	Hopefully fix empty training bug	Nick White

2019-12-13	Mention training in ocr error message	Nick White

2019-12-13	Print stdout and stderr output when tesseract fails	Nick White

2019-12-11	Fix typo incorrectly screwing up PDFs	Nick White

2019-12-11	Clarify use of -training in tools	Nick White

2019-12-11	Clean up and correct book name parsing in the pipeline, and update usage of ↵	Nick White
	getpipelinebook
2019-12-11	Add ability to set a different training for the ocr job	Nick White

2019-12-06	Don't abort PDF generation if pages aren't found, just do the best that can ↵	Nick White
	be done and move on; not all books will have all page types (such as wipeonly books)
2019-12-05	Fix the PDF in analyse step part of bookpipeline	Nick White

2019-12-05	Add pdf generation to analyse step (untested)	Nick White

2019-12-03	Don't pause between OCR page jobs; this should save us significant amounts ↵	Nick White
	of time when there are large numbers of pages
2019-11-19	Send pages to the individual OCR Page queue by default	Nick White
	This now concludes the OCR Page queue stuff; it should all be working out of the box now.
2019-11-19	Add ocrpage queue for processing individual pages	Nick White
	This should be a good way to get around the ongoing heartbeat issue, as individual page jobs will never come close to a the 12 hour mark that can cause the bug. The OCR page processing is done and working now, still to do is to populate the queue (rather than the ocr queue) after preprocessing / wiping.
2019-10-29	Print heartbeat error on failure	Nick White

2019-10-29	Debugging: kill process immediately a heartbeat error is detected (systemd ↵	Nick White
	will restart it soon thereafter)
2019-10-28	Try to fix heartbeat renew issue more fully	Nick White
	This approach first sets the remaining visibility timeout to zero. This should ensure that the message is available to re-find as soon as the process looks for it. Correspondingly the delay between checks is much shorter, as there shouldn't be a reason for much delay.
2019-10-09	Match prebinarised presegmented output from ocropus in wipepattern (named ↵	Nick White
	like "010001.bin.png")
2019-10-08	Update paths of other rescribe imports	Nick White

2019-10-08	Separate out bookpipeline from catch-all go.git repo, and rename to ↵	Nick White
	rescribe.xyz/bookpipeline The dependencies from the go.git repo will follow in due course.