bookpipeline - Tools to process books in a cloud based pipeline system

Age	Commit message (Collapse)	Author
2022-01-17	rescribe: Surface errors properly, with a dialogue box	Nick White

2022-01-17	rescribe: Show friendly names for the trainings, and hide "osd" training	Nick White

2022-01-17	internal/pipeline: if a graph cannot be created, don't leave an empty ↵	Nick White
	graph.png file, and allow failure to download that as it won't be created in the case of a 1 page book, which is fine
2022-01-10	rescribe: Increase size of file & folder picker dialog windows	Nick White

2022-01-10	rescribe: Put log in an accordion, disable buttons when processing, and ↵	Nick White
	don't lock gui when processing
2022-01-10	rescribe: ensure books with a space in the name are handled correctly in the gui	Nick White

2022-01-10	rescribe: Rename PDFs taking into account that in some cases one or the ↵	Nick White
	other of binarised or colour may not exist
2022-01-10	internal/pipeline: Have DownloadPdfs() try to download all PDFs, but only ↵	Nick White
	return an error if none downloaded, as there are times when the colour PDF will not exist, which is fine
2022-01-10	rescribe: handle PDF errors much more gracefully	Nick White

2022-01-04	rescribe: parse stdout and set progress bar based on it, using appropriate ↵	Nick White
	labels for the progress bar text to show what's being done
2022-01-04	rescribe: Restrict file types to select for .pdf and .traineddata file pickers	Nick White

2022-01-04	rescribe: add select box to choose training to use, including an Other... option	Nick White

2021-12-20	rescribe: Ensure temporary tesseract data is only removed when the program ↵	Nick White
	ends, so multiple books can be processed by the gui one after the other
2021-12-20	rescribe: Improve layout of gui, and make dir entry box read only	Nick White

2021-12-20	rescribe: Ensure temporary tesseract dir is removed in gui mode too	Nick White

2021-12-20	rescribe: add "Choose PDF" button, and make chosen dir/file section a label ↵	Nick White
	rather than an entry
2021-12-20	whitespace and error clarity changes	Nick White

2021-12-20	fixed -png flag and changed rescribe tool to save binarized png in separate ↵	Antonia Rescribe
	folder
2021-12-20	rescribe: Include stderr in log area, and ensure button is re-enabled on failure	Nick White

2021-12-06	Update cloud settings to bookpipeline-v18.3	Nick White

2021-12-06	pipeline: process jpg or png regardless of whether in wipe or preprocess queue	Nick White

2021-12-06	graph: make number page parsing much more robust, and ensure fake numbers ↵	Nick White
	are used to create a coherant graph if any page numbers cannot be found from file names
2021-12-06	pipeline: ignore any files with a non-image suffix, rather than erroring on them	Nick White

2021-11-23	rescribe: Remove debugging printfs related to PDF parsing	Nick White

2021-11-23	rescribe: Improve pdf consumption by ensuring only jpg or png are saved to ↵	Nick White
	upload
2021-11-23	gofmt, plus update documentation of recently changed pipeline.UploadImages	Nick White

2021-11-22	internal/pipeline: remove old and broken requirement for TestStorageId()	Nick White

2021-11-22	changed put.go so that a 4-digit number is appended to the end of each ↵	Antonia Rescribe
	filename when images are uploaded to the pipeline
2021-11-22	rescribe: Add support for reading images directly from PDFs	Nick White
	There are several TODO items before this can be considered "good enough", let alone complete. See the comments in the code for details. On a good day, with a fair wind, though, this works.
2021-11-22	rescribe: replace errors.New with fmt.Errorf	Nick White

2021-11-20	update spot image again	Nick White

2021-11-20	Update spot image to v18.0	Nick White

2021-11-20	Enable fyne gui again	Nick White

2021-11-09	lspipeline-ng: Remove debugging printf	Nick White

2021-11-02	rescribe: handle directories with spaces correctly	Nick White

2021-10-29	Temporarily disable fyne module, as it causes issues with go1.11 build	Nick White

2021-10-26	rescribe: Separate gui code, and organise it better (should be no functional ↵	Nick White
	change)
2021-10-25	rescribe: wip gui using fyne	Nick White

2021-10-12	rescribe: fix lookup of external training filev0.5.3	Nick White

2021-10-01	rescribe: Include new tessdata in embed getterv0.5.2	Nick White

2021-10-01	rescribe: Add embedded lat.traineddata	Nick White

2021-10-01	rescribe: Add both original training path and embedded version on error ↵	Nick White
	output for training file not found, so that its clear that the file specified may not exist
2021-08-30	pdf: Always encode images as jpegv0.5.1	Nick White
	Previously for PDFs using binarised images we kept them as PNG, but there's no good reason to do so, it's better to just get the space savings on offer from jpeg.
2021-08-30	adjusted the height of the image in the pdf to 1000px if the smaller option ↵	Antonia Rescribe
	is chosen
2021-08-24	rescribe: improve makefile to match the way we deploy to the website	Nick White

2021-08-19	lspipeline-ng: Limit number of book details requests so we don't run into ↵v0.5.0	Nick White
	EC2's rate limiting
2021-08-18	rescribe: Update documentation on how to deal with M1 signing, and move ↵	Nick White
	makefile to where it makes sense
2021-08-17	pdf: Stretch words to fit in their boxes, for more perfect embedding	Nick White
	- Words are stretched to fit their boxes, which means the accuracy is now very high indeed. This was done by modifying gofpdf to add the SetCellStretchToFit function, which will hopefully be upstreamed in due course. - Copy pasting from a PDF works well with lines rarely if ever being erroneously broken by the PDF reader. There was quite a bit of trial-and-error to improve this, and the stretched text plus a space being added after the word in CellFormat was the best (plus preserves accuracy of word and character locations).
2021-08-17	pipeline: use regular storage for tests, rather than a separate one	Nick White

2021-08-09	pdf: use same line height and origin for all words on a line as it makes ↵	Nick White
	things neater in the PDF in most cases