summaryrefslogtreecommitdiff
path: root/cmd/bookpipeline/main.go
AgeCommit message (Collapse)Author
2020-05-22Fix bookpipeline failing if shutdown option isnt usedNick White
2020-04-14Briefly document each of the commands in a godoc friendly way, and improve ↵Nick White
the cloudsettings documentation slightly
2020-04-07Remove unused OCR queue (was superceded by the ocrpage queue some time ago)Nick White
2020-04-07gofmtNick White
2020-03-31Disable autoshutdown by default for bookpipeline, and update to ami 0.11 ↵Nick White
(which reenables it for spot instances)
2020-03-31[bookpipeline] Fix typo in previous commit and rename HeartbeatTime to ↵Nick White
HeartbeatSeconds, as it is not a Time
2020-03-31[bookpipeline] Stop using filepath.Join for storage keys, as we want to ↵Nick White
ensure it is always a / delimeter
2020-03-31[bookpipeline] Improve logging outputNick White
2020-03-31[bookpipeline] Add (experimental) log saving functionalityNick White
2020-03-30[bookpipeline] Clean up autoshutdownNick White
2020-03-30[bookpipeline] Enable real shutdown when bookpipeline has been idle for 5 ↵Nick White
minutes
2020-03-30[bookpipeline] Neaten shutdown fixNick White
2020-03-30[bookpipeline] Fix hang bug when restarting shutdown timerNick White
2020-03-30Rewrite autoshutdown to do things right [bugs excluded] (wip)Nick White
2020-03-24[bookpipeline] Improve autoshutdown wipNick White
2020-03-24[bookpipeline] Add experimental (dummy) shutdown partNick White
2020-03-23Add Log() function to Pipeliner interfaceNick White
This simplifies things nicely from using conn.GetLogger().Println() to conn.Log()
2020-03-23Replace errors.New(fmt.Sprintf with fmt.ErrorfNick White
Embarassing I hadn't noticed the fmt.Errorf function before, but better late than never.
2020-03-23Don't try to make a graph with one line (it will fail), and don't mark ↵Nick White
analysis as failed if graph isn't made for that reason
2020-02-27Add documentation, license notices, and licenseNick White
2020-02-05Fix allOCRed for wipeonly books (hopefully)Nick White
allOCRed was checking for wipePattern files, however they should have been transformed into the regular preprocessedPattern for OCR anyway, so shouldn't have been directly OCRed. Thus, allOCRed was mistakenly looking for .hocr versions of the original wipePattern files, which never would have been produced.
2019-12-13Hopefully fix empty training bugNick White
2019-12-13Mention training in ocr error messageNick White
2019-12-13Print stdout and stderr output when tesseract failsNick White
2019-12-11Fix typo incorrectly screwing up PDFsNick White
2019-12-11Clarify use of -training in toolsNick White
2019-12-11Clean up and correct book name parsing in the pipeline, and update usage of ↵Nick White
getpipelinebook
2019-12-11Add ability to set a different training for the ocr jobNick White
2019-12-06Don't abort PDF generation if pages aren't found, just do the best that can ↵Nick White
be done and move on; not all books will have all page types (such as wipeonly books)
2019-12-05Fix the PDF in analyse step part of bookpipelineNick White
2019-12-05Add pdf generation to analyse step (untested)Nick White
2019-12-03Don't pause between OCR page jobs; this should save us significant amounts ↵Nick White
of time when there are large numbers of pages
2019-11-19Send pages to the individual OCR Page queue by defaultNick White
This now concludes the OCR Page queue stuff; it should all be working out of the box now.
2019-11-19Add ocrpage queue for processing individual pagesNick White
This should be a good way to get around the ongoing heartbeat issue, as individual page jobs will never come close to a the 12 hour mark that can cause the bug. The OCR page processing is done and working now, still to do is to populate the queue (rather than the ocr queue) after preprocessing / wiping.
2019-10-29Print heartbeat error on failureNick White
2019-10-29Debugging: kill process immediately a heartbeat error is detected (systemd ↵Nick White
will restart it soon thereafter)
2019-10-28Try to fix heartbeat renew issue more fullyNick White
This approach first sets the remaining visibility timeout to zero. This should ensure that the message is available to re-find as soon as the process looks for it. Correspondingly the delay between checks is much shorter, as there shouldn't be a reason for much delay.
2019-10-09Match prebinarised presegmented output from ocropus in wipepattern (named ↵Nick White
like "010001.bin.png")
2019-10-08Update paths of other rescribe importsNick White
2019-10-08Separate out bookpipeline from catch-all go.git repo, and rename to ↵Nick White
rescribe.xyz/bookpipeline The dependencies from the go.git repo will follow in due course.