summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-23[getpipelinebook] Add -binarisedpdf and -colourpdf flagsNick White
2020-03-23[getpipelinebook] Add -graph flag to download just graphsNick White
2020-03-10Update image used for spotsNick White
2020-03-09Add nobooks flag to lspipeline so it has a faster modeNick White
2020-02-28Tidy go.sumNick White
2020-02-28Update go-chart to a working version in go.modv0.1.2Nick White
2020-02-27Update rescribe.xyz/utils dependencyv0.1.1Nick White
2020-02-27Remove fonttobytes (use the one in rescribe.xyz/utils repo instead)Nick White
2020-02-27Update go.modNick White
2020-02-27Fix Sprintf usageNick White
2020-02-27Add documentation, license notices, and licenseNick White
2020-02-27Improve usage description of confgraph and pagegraphNick White
2020-02-05Fix allOCRed for wipeonly books (hopefully)Nick White
allOCRed was checking for wipePattern files, however they should have been transformed into the regular preprocessedPattern for OCR anyway, so shouldn't have been directly OCRed. Thus, allOCRed was mistakenly looking for .hocr versions of the original wipePattern files, which never would have been produced.
2020-01-22[pagegraph] Stop printing debug outputNick White
2020-01-22[pagegraph] Fix bug where word graphs werent stable as their number wasnt ↵Nick White
parsed by graph, and add line or word option
2020-01-22Make pagegraph use lines againNick White
2020-01-22Remove unused function in pagegraphNick White
2020-01-21Add pagegraph toolNick White
2019-12-17Add png flag to getpipelinebookNick White
2019-12-17Add pdf flag to getpipelinebookNick White
2019-12-16Fix error message syntax in getpipelinebookNick White
2019-12-16Add a new tool, addtoqueue, which can be used to generically add any message ↵Nick White
to any queue
2019-12-16Fix usage message for getpipelinebook, and trim final slashes in lspipeline ↵Nick White
output
2019-12-13Update StartInstance to point to the newest imageNick White
2019-12-13Hopefully fix empty training bugNick White
2019-12-13Mention training in ocr error messageNick White
2019-12-13Print stdout and stderr output when tesseract failsNick White
2019-12-11Add addtoanalysequeue tool, which is useful for debuggingNick White
2019-12-11Fix typo incorrectly screwing up PDFsNick White
2019-12-11Clarify use of -training in toolsNick White
2019-12-11Clean up and correct book name parsing in the pipeline, and update usage of ↵Nick White
getpipelinebook
2019-12-11Add ability to set a different training for the ocr jobNick White
2019-12-11Use aws.go with mkpipeline too, plus fix one log.Fatal call in aws.go which ↵Nick White
should have been handled by caller
2019-12-06Don't abort PDF generation if pages aren't found, just do the best that can ↵Nick White
be done and move on; not all books will have all page types (such as wipeonly books)
2019-12-05Remove (the generally empty) files in the case of a failed downloadNick White
2019-12-05Default getpipelinebook to downloading pdfs instead of imagesNick White
2019-12-05Fix the PDF in analyse step part of bookpipelineNick White
2019-12-05Add pdf generation to analyse step (untested)Nick White
2019-12-03Rewrite lspipeline book listing part to be much faster by taking advantage ↵Nick White
of the aws CommonPrefixes output
2019-12-03Don't pause between OCR page jobs; this should save us significant amounts ↵Nick White
of time when there are large numbers of pages
2019-11-29Make error message clear what page is causing issuesNick White
2019-11-26Improve usage noticeNick White
2019-11-26Ensure error in file walking is correctly returnedNick White
2019-11-20Add x/image to go.modNick White
2019-11-20Merge branch 'addpdf'Nick White
2019-11-20Implement image resizing option into PDF generation, so that smaller PDFs to ↵Nick White
be generated
2019-11-19Send pages to the individual OCR Page queue by defaultNick White
This now concludes the OCR Page queue stuff; it should all be working out of the box now.
2019-11-19Add ocrpage queue for processing individual pagesNick White
This should be a good way to get around the ongoing heartbeat issue, as individual page jobs will never come close to a the 12 hour mark that can cause the bug. The OCR page processing is done and working now, still to do is to populate the queue (rather than the ocr queue) after preprocessing / wiping.
2019-11-12Merge branch 'addpdf'Nick White
2019-11-12Embed a font, compressed, into the binaryNick White