bookpipeline - Tools to process books in a cloud based pipeline system

Age	Commit message (Collapse)	Author
2022-03-21	rescribe: Update copyright years and add TODO file	Nick White

2022-03-21	rescribe: Update traineddata descriptions in command line version	Nick White

2022-03-21	rescribe: remove unneeded old tessdata	Nick White

2022-03-21	Update tessdata to only include a few trainings	Nick White

2022-03-21	rescribe: fix bug in gui where choosing "other" then cancelling would leave ↵	Nick White
	the "other" training selected
2022-03-21	rescribe: move getBookIdFromUrl() to gbook.go, and add tests for it	Nick White

2022-03-21	rescribe: Remove unneeded clause and add example urls for gbook id function	Nick White

2022-03-21	added support for new type of Google Books URLS	Antonia Rescribe

2022-03-21	rescribe: Simplify disabling and enabling common widgets	Nick White

2022-03-21	rescribe: disable & enable checkboxes alongside other parts when processing	Nick White

2022-03-21	rescribe: Improve cli wording and simplify PDF stuff slightly	Nick White

2022-03-21	Only generate full-size PDF if requested	Nick White
	This avoids the issue that large PDFs require a lot of RAM, so there are chances of running out of memory. Plus it's a waste of space and time.
2022-03-11	Add initial support for full-size PDF generation	Nick White
	Some issues: 1) The PDF generation stores every page in memory while it constructs it. That means that there's a higher chance of failure due to running out of memory with these. There's no getting around this except by improving the PDF generation library, which is not easy. 2) Currently I've just changed the pipeline to always generate these full size PDFs, and then the rescribe tool will just delete them if they weren't requested. This is bad in particular because of point 1, and would probably cause issues of failures in the server pipeline as a result Therefore the plan is to add a tag to queue messages so that full size generation can be selectively enabled. Also, it should be split from the loop with colour pdf generation, as holding them both in RAM at the same time is unnecessary.
2022-03-11	Name PDF extracted images so they sort correctly	Nick White

2022-02-28	rescribe: improve layout of completed popup	Nick White

2022-02-28	rescribe: Further improve getembeds error reporting and recognition	Nick White

2022-02-28	rescribe: Fix error printing with getembeds	Nick White

2022-02-28	rescribe: Add embed_darwin.go to include getbook into OSX builds	Nick White

2022-02-28	rescribe: Add " searchable" to file name for saved PDF	Nick White

2022-02-28	rescribe: Add popup on completion reporting where files were saved	Nick White

2022-02-28	rescribe: Improve wording of training dropdown	Nick White

2022-02-28	Add PreNoWipe queue, that just does binarisation but no wiping	Nick White

2022-02-28	bookpipeline: ensure context is initialised before using it, to avoid panic	Nick White

2022-02-28	bookpipeline: Switch to rescribev9 as default training	Nick White

2022-02-24	rescribe: Add embedded getgbook	Nick White

2022-02-24	rescribe: Improve getgbook failure error dialog by keeping it simple	Nick White

2022-02-23	rescribe: fix typo with embedded getgbook running	Nick White

2022-02-23	Add getgbook embedding for darwin	Nick White

2022-02-23	rescribe: Add embedded support for getgbook, for linux only so far	Nick White

2022-02-21	rescribe: Add getgbook use to the GUI (not embedded yet)	Nick White

2022-02-21	Ensure that no new console windows are opened on Windows when executing ↵	Nick White
	Tesseract
2022-02-21	rescribe: add .zip version of .app for mac	Nick White

2022-02-14	rescribe: Add gui elements for getgbook integration (wip)	Nick White

2022-02-09	rescribe: ensure go generate is called when needed in makefile, and remove ↵	Nick White
	unneeded rules
2022-02-09	rescribe: makefile now supports cross-compiling with fyne	Nick White
	This also necessitated a version bump to fyne.
2022-02-09	rescribe: make go generate skip already downloaded files, checking that ↵	Nick White
	checksum matches expected for safety
2022-01-31	rescribe: remove unnecessary extra cancel calls; anything which errors ↵	Nick White
	should clean up well enough, and this has the potential for more harder to find bugs
2022-01-31	Ensure cancel is sent to any errant processes in case of an error, and stick ↵	Nick White
	with "Start OCR" for go button
2022-01-31	rescribe: Add context cancelling to extractPdfImgs(), so it's no longer ↵	Nick White
	possible to get the gui into a bad state by cancelling before startProcess began (hopefully)
2022-01-31	rescribe: Ensure status isnt overwritten after an abort, when wipe-only ↵	Nick White
	preprocessing
2022-01-31	rescribe: fix bug where a successful run would segfault	Nick White

2022-01-31	Make pipeline context-aware, so the rescribe tool can cancel jobs	Nick White

2022-01-17	rescribe: Surface errors properly, with a dialogue box	Nick White

2022-01-17	rescribe: Show friendly names for the trainings, and hide "osd" training	Nick White

2022-01-10	rescribe: Increase size of file & folder picker dialog windows	Nick White

2022-01-10	rescribe: Put log in an accordion, disable buttons when processing, and ↵	Nick White
	don't lock gui when processing
2022-01-10	rescribe: ensure books with a space in the name are handled correctly in the gui	Nick White

2022-01-10	rescribe: Rename PDFs taking into account that in some cases one or the ↵	Nick White
	other of binarised or colour may not exist
2022-01-10	rescribe: handle PDF errors much more gracefully	Nick White

2022-01-04	rescribe: parse stdout and set progress bar based on it, using appropriate ↵	Nick White
	labels for the progress bar text to show what's being done