summaryrefslogtreecommitdiff
path: root/cmd
AgeCommit message (Collapse)Author
2019-12-16Fix usage message for getpipelinebook, and trim final slashes in lspipeline ↵Nick White
output
2019-12-13Hopefully fix empty training bugNick White
2019-12-13Mention training in ocr error messageNick White
2019-12-13Print stdout and stderr output when tesseract failsNick White
2019-12-11Add addtoanalysequeue tool, which is useful for debuggingNick White
2019-12-11Fix typo incorrectly screwing up PDFsNick White
2019-12-11Clarify use of -training in toolsNick White
2019-12-11Clean up and correct book name parsing in the pipeline, and update usage of ↵Nick White
getpipelinebook
2019-12-11Add ability to set a different training for the ocr jobNick White
2019-12-11Use aws.go with mkpipeline too, plus fix one log.Fatal call in aws.go which ↵Nick White
should have been handled by caller
2019-12-06Don't abort PDF generation if pages aren't found, just do the best that can ↵Nick White
be done and move on; not all books will have all page types (such as wipeonly books)
2019-12-05Default getpipelinebook to downloading pdfs instead of imagesNick White
2019-12-05Fix the PDF in analyse step part of bookpipelineNick White
2019-12-05Add pdf generation to analyse step (untested)Nick White
2019-12-03Rewrite lspipeline book listing part to be much faster by taking advantage ↵Nick White
of the aws CommonPrefixes output
2019-12-03Don't pause between OCR page jobs; this should save us significant amounts ↵Nick White
of time when there are large numbers of pages
2019-11-29Make error message clear what page is causing issuesNick White
2019-11-26Improve usage noticeNick White
2019-11-26Ensure error in file walking is correctly returnedNick White
2019-11-20Merge branch 'addpdf'Nick White
2019-11-20Implement image resizing option into PDF generation, so that smaller PDFs to ↵Nick White
be generated
2019-11-19Send pages to the individual OCR Page queue by defaultNick White
This now concludes the OCR Page queue stuff; it should all be working out of the box now.
2019-11-19Add ocrpage queue for processing individual pagesNick White
This should be a good way to get around the ongoing heartbeat issue, as individual page jobs will never come close to a the 12 hour mark that can cause the bug. The OCR page processing is done and working now, still to do is to populate the queue (rather than the ocr queue) after preprocessing / wiping.
2019-11-12Fix sleep in unstickocrNick White
2019-11-12Add unstickocr tool, until the heartbeat bug is eliminatedNick White
2019-11-12Add spotme command to start appropriate spot instancesNick White
2019-11-01Compress the font with zlib, and include it in repoNick White
2019-10-31Add capability to embed font files into toolNick White
2019-10-31PDF: add functionality to use "best" file if it existsNick White
2019-10-31Add flag to switch between binarised and colour outputNick White
2019-10-31Move PDF handling code to a separate fileNick White
2019-10-31Many improvements to pdfbook; basically working nowNick White
2019-10-31Add work in progress PDF producerNick White
2019-10-29Print heartbeat error on failureNick White
2019-10-29Debugging: kill process immediately a heartbeat error is detected (systemd ↵Nick White
will restart it soon thereafter)
2019-10-28Try to fix heartbeat renew issue more fullyNick White
This approach first sets the remaining visibility timeout to zero. This should ensure that the message is available to re-find as soon as the process looks for it. Correspondingly the delay between checks is much shorter, as there shouldn't be a reason for much delay.
2019-10-23getpipelinebook: default to downloading corresponding page images, and add ↵Nick White
option to download the original page images too
2019-10-16Rewrite booktopipeline to use bookpipeline aws interfaceNick White
2019-10-16Sort book list in lspipeline by modified dateNick White
2019-10-16Ensure booktopipeline complains if given too many argumentsNick White
2019-10-16Another attempted fix to "too many open files" issueNick White
2019-10-16Ensure files are promptly closed by booktopipelineNick White
2019-10-09Make confgraph and graph in general more resilient to bad inputNick White
2019-10-09Match prebinarised presegmented output from ocropus in wipepattern (named ↵Nick White
like "010001.bin.png")
2019-10-08Update paths of other rescribe importsNick White
2019-10-08Separate out bookpipeline from catch-all go.git repo, and rename to ↵Nick White
rescribe.xyz/bookpipeline The dependencies from the go.git repo will follow in due course.