summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-09[bookpipeline] Split most functionality out to package internal/pipelineNick White
No functionality changes, but this should make it easier to make custom builds using the pipeline in slightly different ways.
2020-11-09Add -autostop, so time to shutdown can be specified, and so the process can ↵Nick White
just be stopped after a period, rather than the whole computer shut down
2020-11-09[bookpipeline] Improve interface, particularly for local use, by disabling ↵Nick White
(failing) log saving, mail sending, and removing erroneous references to AWS
2020-11-09Set hocr config options directly rather than relying on 'hocr' config fileNick White
This ensures that bookpipeline will still work even if TESSDATA_PREFIX has been set to a directory without configs in it.
2020-11-06Fix the README to be valid markdown in the local exampleNick White
2020-11-06Document the local modeNick White
2020-11-06Add git clone advice to readmeNick White
2020-10-21Fix a bug that caused analyse step to not be triggered with local connectionNick White
2020-10-20Improve logging by using Println, which ensures there is a space between ↵Nick White
arguments, even if all are strings
2020-10-20Fix local queue deletion properlyNick White
2020-10-20Hopefully fix off-by-one error causing errors with local bookpipelineNick White
2020-10-20Add postprocess-bythresh cmdNick White
2020-10-20Update spot image to useNick White
2020-09-22[booktopipeline] Check that all images are valid before adding to pipelineNick White
2020-09-15Abort and delete a failed wipeonly job, like we do with preprocessingNick White
There was no reason not to do this with wipeonly as well, and sure enough a single broken PNG image in a wipeonly task would cause the queue to exponentially fill as happened previously.
2020-09-07Update spot instance ami once againNick White
2020-09-01Update spot instance ami to useNick White
2020-09-01Fix confusing usage message for booktopipelineNick White
2020-08-24update getsamplepages to just get jpg pagesNick White
2020-08-19Add getsamplepagesNick White
2020-08-18Update preproc to v0.4.0 to enable vertical wipingNick White
2020-07-28Allow override of autodetected queues for booktopipelineNick White
2020-07-28Autodetect queue for booktopipeline based on file extensionAntonia Karaisl
2020-07-27Use os.Getenv() to find config dir, rather than rely on os.UserConfigDir(), ↵Nick White
as that isnt present on go1.11
2020-07-27Update AMI to new one which includes a mailsettings fileNick White
2020-07-27Switch mail settings to an externally set fileNick White
2020-07-21[bookpipeline] If preprocessing fails, email us and remove the job from the ↵Nick White
queue This prevents the current situation where a failed preprocessing job is endlessly repeated, potentially spawning thousands of ocrpage jobs in its wake each time. Note that the email stuff works but requires putting secrets into .go files, so need to rewrite that to read from somewhere more sensible like a dotfile on the host.
2020-07-20Fix typoNick White
2020-07-20Merge branch 'master' of https://git.rescribe.xyz/bookpipelineNick White
2020-07-20Update preproc to v0.1.4 to take advantage of vertical wiping parameters, ↵v0.2.5Nick White
and change WipeFile() to take advantage of them
2020-06-16[getallhocrs] Skip files which have already been downloadedNick White
2020-06-15Add getallhocrs toolNick White
2020-06-03Hopefully fix last bug in analyse step of bookpipelineNick White
2020-06-03Fix bug in analyse step of bookpipelineNick White
2020-06-02Fix race condition that could cause errors to be silently discardedNick White
This was a nasty one. By closing the up channel, the up() function would finish and send to the done channel. This means that the select between err and done would be random as to which was picked, whereas of course if there has been an error that path must be taken.
2020-06-02Proper full fix for local queue handling (hopefully)Nick White
2020-06-02Fix bug with local queue deletionNick White
2020-06-01Mention documentation URLNick White
2020-05-29[bookpipeline] Remove local copy of original page image once preprocessedNick White
2020-05-29Merge branch 'minimisedisk'v0.2.4Nick White
2020-05-26Merge branch 'local'Nick White
2020-05-26Add -c conntype for necessary tools to allow local connection to be usedNick White
2020-05-26Fix DelFromQueue and Upload for local connectionsNick White
2020-05-22Fix CheckQueue for LocalConnlocalNick White
2020-05-22Fix bookpipeline failing if shutdown option isnt usedNick White
2020-05-22Fix bookpipeline failing if shutdown option isnt usedNick White
2020-05-22Add experimental local connection typeNick White
2020-05-22[untested] Use less disk spaceminimisediskNick White
There are several ways that disk usage is reduced with this patch: - Files are deleted as soon as they have been uploaded - Once a page image has been added to a PDF, immediately delete it This should allow much larger books to be processed without needing bigger disks.
2020-05-19Add getandpurgequeue debugging toolNick White
2020-05-06Update spot image againNick White