bookpipeline - Tools to process books in a cloud based pipeline system

Age	Commit message (Collapse)	Author
2021-12-20	whitespace and error clarity changes	Nick White

2021-12-20	fixed -png flag and changed rescribe tool to save binarized png in separate ↵	Antonia Rescribe
	folder
2021-12-20	rescribe: Include stderr in log area, and ensure button is re-enabled on failure	Nick White

2021-12-06	pipeline: process jpg or png regardless of whether in wipe or preprocess queue	Nick White

2021-11-23	rescribe: Remove debugging printfs related to PDF parsing	Nick White

2021-11-23	rescribe: Improve pdf consumption by ensuring only jpg or png are saved to ↵	Nick White
	upload
2021-11-23	gofmt, plus update documentation of recently changed pipeline.UploadImages	Nick White

2021-11-22	rescribe: Add support for reading images directly from PDFs	Nick White
	There are several TODO items before this can be considered "good enough", let alone complete. See the comments in the code for details. On a good day, with a fair wind, though, this works.
2021-11-22	rescribe: replace errors.New with fmt.Errorf	Nick White

2021-11-09	lspipeline-ng: Remove debugging printf	Nick White

2021-11-02	rescribe: handle directories with spaces correctly	Nick White

2021-10-26	rescribe: Separate gui code, and organise it better (should be no functional ↵	Nick White
	change)
2021-10-25	rescribe: wip gui using fyne	Nick White

2021-10-12	rescribe: fix lookup of external training filev0.5.3	Nick White

2021-10-01	rescribe: Include new tessdata in embed getterv0.5.2	Nick White

2021-10-01	rescribe: Add embedded lat.traineddata	Nick White

2021-10-01	rescribe: Add both original training path and embedded version on error ↵	Nick White
	output for training file not found, so that its clear that the file specified may not exist
2021-08-24	rescribe: improve makefile to match the way we deploy to the website	Nick White

2021-08-19	lspipeline-ng: Limit number of book details requests so we don't run into ↵v0.5.0	Nick White
	EC2's rate limiting
2021-08-18	rescribe: Update documentation on how to deal with M1 signing, and move ↵	Nick White
	makefile to where it makes sense
2021-08-17	pipeline: use regular storage for tests, rather than a separate one	Nick White

2021-08-02	rescribe: Add experimental m1 build	Nick White

2021-08-02	internal/pipeline: Add test (incomplete but working) for UploadImages	Nick White

2021-07-20	Cleanup thanks to go vet	Nick White

2021-07-13	gofmt	Nick White

2021-07-12	Add necessary pipeliner dependency for testqueue (probably remove this from ↵	Nick White
	internal library later as its only needed for tests
2021-07-12	Add test for upAndQueue function	Nick White
	This involved adding a test queue, so it can be run safely without intefering with the pipeline.
2021-07-08	rescribe: Exit with an error if directory doesn't exist	Nick White

2021-06-29	rescribe: add documentation on how to generate embedded data	Nick White

2021-06-29	rescribe: Add embed target for darwin (osx) too	Nick White

2021-06-22	rescribe: Remove erroneous unnecessary mkdir	Nick White

2021-06-22	rescribe: Make it clearer that embedded training files are available to use	Nick White

2021-06-22	rescribe: add embedded tesseract for linux	Nick White

2021-06-22	rescribe: allow use of embedded training even if -systess is used	Nick White

2021-06-22	rescribe: Add go generate command to download the needed files to embed	Nick White

2021-06-22	rescribe: Add an embedded tessdata	Nick White

2021-06-21	rescribe: Set up so only Tesseract needed for the build platform is embedded	Nick White

2021-06-21	rescribe: Embed Tesseract into binary so that no Tesseract install is necessary	Nick White

2021-05-31	Fix bug after changing pipeliner for tests, to ensure DeleteObjects is ↵	Nick White
	available to Pipeliner
2021-03-16	rescribe: change default training directory to trainings/v0.3.3	Nick White

2021-02-22	lspipeline: Rename to lspipeline-ng, and restore pre concurrency version to ↵	Nick White
	lspipeline as there are some hard to debug issues in concurrency version
2021-02-15	getsamplepages: Add -prefix option, and use 'best' to get random page numbers	Nick White
	The -prefix option is useful to us. Previously only a .jpg for page number 100 was retreived, which failed if the book had fewer (or unusually named) pages, and also didn't provide a corresponding .hocr at all (bug introduced with 48958d2). Using 'best', which is (effectively) randomly sorted, provides a guaranteed to exist page, and a random one at that.
2021-01-26	Make ListObjectsWithMeta generic again and create a specialised ↵	Nick White
	ListObjectWithMeta for single file listing, so we can still be as fast, but do not have a misleading api
2021-01-26	Improve lspipeline concurrency by removing WaitGroup stuff	Nick White

2021-01-26	Speed up lspipeline by making s3 requests concurrently and only processing ↵	Nick White
	single results from ListObjects requests
2020-12-15	[rmbook] Append / to end of bookname, to ensure e.g. "1" doesnt match all ↵	Nick White
	books starting with "1"
2020-12-15	[rmbook] Add -dryrun flag	Nick White

2020-12-14	Add rmbook tool	Nick White

2020-12-07	[rescribe] Fix up *.hocr glob, which ensures that using a savedir that ↵v0.3.2	Nick White
	already has a hocr directory in it will work
2020-12-07	[rescribe] Allow saving of results to somewhere other than a directory named ↵	Nick White
	after the book being processed