bookpipeline - Tools to process books in a cloud based pipeline system

Age	Commit message (Collapse)	Author
2020-03-23	[getpipelinebook] Add -binarisedpdf and -colourpdf flags	Nick White

2020-03-23	[getpipelinebook] Add -graph flag to download just graphs	Nick White

2020-03-10	Update image used for spots	Nick White

2020-03-09	Add nobooks flag to lspipeline so it has a faster mode	Nick White

2020-02-28	Tidy go.sum	Nick White

2020-02-28	Update go-chart to a working version in go.modv0.1.2	Nick White

2020-02-27	Update rescribe.xyz/utils dependencyv0.1.1	Nick White

2020-02-27	Remove fonttobytes (use the one in rescribe.xyz/utils repo instead)	Nick White

2020-02-27	Update go.mod	Nick White

2020-02-27	Fix Sprintf usage	Nick White

2020-02-27	Add documentation, license notices, and license	Nick White

2020-02-27	Improve usage description of confgraph and pagegraph	Nick White

2020-02-05	Fix allOCRed for wipeonly books (hopefully)	Nick White
	allOCRed was checking for wipePattern files, however they should have been transformed into the regular preprocessedPattern for OCR anyway, so shouldn't have been directly OCRed. Thus, allOCRed was mistakenly looking for .hocr versions of the original wipePattern files, which never would have been produced.
2020-01-22	[pagegraph] Stop printing debug output	Nick White

2020-01-22	[pagegraph] Fix bug where word graphs werent stable as their number wasnt ↵	Nick White
	parsed by graph, and add line or word option
2020-01-22	Make pagegraph use lines again	Nick White

2020-01-22	Remove unused function in pagegraph	Nick White

2020-01-21	Add pagegraph tool	Nick White

2019-12-17	Add png flag to getpipelinebook	Nick White

2019-12-17	Add pdf flag to getpipelinebook	Nick White

2019-12-16	Fix error message syntax in getpipelinebook	Nick White

2019-12-16	Add a new tool, addtoqueue, which can be used to generically add any message ↵	Nick White
	to any queue
2019-12-16	Fix usage message for getpipelinebook, and trim final slashes in lspipeline ↵	Nick White
	output
2019-12-13	Update StartInstance to point to the newest image	Nick White

2019-12-13	Hopefully fix empty training bug	Nick White

2019-12-13	Mention training in ocr error message	Nick White

2019-12-13	Print stdout and stderr output when tesseract fails	Nick White

2019-12-11	Add addtoanalysequeue tool, which is useful for debugging	Nick White

2019-12-11	Fix typo incorrectly screwing up PDFs	Nick White

2019-12-11	Clarify use of -training in tools	Nick White

2019-12-11	Clean up and correct book name parsing in the pipeline, and update usage of ↵	Nick White
	getpipelinebook
2019-12-11	Add ability to set a different training for the ocr job	Nick White

2019-12-11	Use aws.go with mkpipeline too, plus fix one log.Fatal call in aws.go which ↵	Nick White
	should have been handled by caller
2019-12-06	Don't abort PDF generation if pages aren't found, just do the best that can ↵	Nick White
	be done and move on; not all books will have all page types (such as wipeonly books)
2019-12-05	Remove (the generally empty) files in the case of a failed download	Nick White

2019-12-05	Default getpipelinebook to downloading pdfs instead of images	Nick White

2019-12-05	Fix the PDF in analyse step part of bookpipeline	Nick White

2019-12-05	Add pdf generation to analyse step (untested)	Nick White

2019-12-03	Rewrite lspipeline book listing part to be much faster by taking advantage ↵	Nick White
	of the aws CommonPrefixes output
2019-12-03	Don't pause between OCR page jobs; this should save us significant amounts ↵	Nick White
	of time when there are large numbers of pages
2019-11-29	Make error message clear what page is causing issues	Nick White

2019-11-26	Improve usage notice	Nick White

2019-11-26	Ensure error in file walking is correctly returned	Nick White

2019-11-20	Add x/image to go.mod	Nick White

2019-11-20	Merge branch 'addpdf'	Nick White

2019-11-20	Implement image resizing option into PDF generation, so that smaller PDFs to ↵	Nick White
	be generated
2019-11-19	Send pages to the individual OCR Page queue by default	Nick White
	This now concludes the OCR Page queue stuff; it should all be working out of the box now.
2019-11-19	Add ocrpage queue for processing individual pages	Nick White
	This should be a good way to get around the ongoing heartbeat issue, as individual page jobs will never come close to a the 12 hour mark that can cause the bug. The OCR page processing is done and working now, still to do is to populate the queue (rather than the ocr queue) after preprocessing / wiping.
2019-11-12	Merge branch 'addpdf'	Nick White

2019-11-12	Embed a font, compressed, into the binary	Nick White