summaryrefslogtreecommitdiff
path: root/internal/pipeline/pipeline.go
AgeCommit message (Collapse)Author
2022-03-21Only generate full-size PDF if requestedNick White
This avoids the issue that large PDFs require a lot of RAM, so there are chances of running out of memory. Plus it's a waste of space and time.
2022-03-11Separate out fullsize pdf creation from colour pdf creation, so less memory ↵fullsizepdfNick White
is needed
2022-03-11Add initial support for full-size PDF generationNick White
Some issues: 1) The PDF generation stores every page in memory while it constructs it. That means that there's a higher chance of failure due to running out of memory with these. There's no getting around this except by improving the PDF generation library, which is not easy. 2) Currently I've just changed the pipeline to always generate these full size PDFs, and then the rescribe tool will just delete them if they weren't requested. This is bad in particular because of point 1, and would probably cause issues of failures in the server pipeline as a result Therefore the plan is to add a tag to queue messages so that full size generation can be selectively enabled. Also, it should be split from the loop with colour pdf generation, as holding them both in RAM at the same time is unnecessary.
2022-02-28Add PreNoWipe queue, that just does binarisation but no wipingNick White
2022-02-21Ensure that no new console windows are opened on Windows when executing ↵Nick White
Tesseract
2022-01-31Make pipeline context-aware, so the rescribe tool can cancel jobsNick White
2022-01-17internal/pipeline: if a graph cannot be created, don't leave an empty ↵Nick White
graph.png file, and allow failure to download that as it won't be created in the case of a 1 page book, which is fine
2021-07-19internal/pipeline: Be more explicit with exactly what functions are in each ↵Nick White
interface, to ensure no "duplicate function" errors when compiling
2021-07-13internal/pipeline: Reorganise interfaces so that functions only declare what ↵Nick White
they need We were using Pipeliner as a catch-all, but it's nicer if the functions can just state that e.g. they need download functionality, so decompose things so that that's how we do things now.
2021-07-12Add test for upAndQueue functionNick White
This involved adding a test queue, so it can be run safely without intefering with the pipeline.
2021-05-31Add a test for up(), and document download() and up() properlyNick White
2021-05-19Close process channel after writing to err channel in download(), in case of ↵Nick White
an error This is needed so that in tests the error can be selected out reliably, rather than an empty process signal.
2021-05-19Add tests for download()Nick White
2021-05-19Fix syntax for some fmt.Errorf callsNick White
2020-12-03Don't upload binarised pdf twice needlesslyNick White
This can also result in the file being uploaded twice simultaneously, as up() is running in a separate goroutine. This can cause failures on Windows as the file is attempted to be removed by one upload process while being open to upload by the other process. Probably it could also fail if the process completed by one (so the file was deleted) before being started by the other.
2020-11-16Some changes to ensure the pipeline works correctly on WindowsNick White
There were a couple of places where a file was uploaded while still open, which resulted in an attempt to remove it, which causes an error from Windows. The allOCRed function also included an assumption that the path separator would be a /, which is always correct for AWS, and correct for local on Linux and OSX, but not for local Windows. Fixed by leaving the separator well alone. Also, the local connection was not stripping leading \, like it did /, which caused an issue with Windows local. Windows local is now tested and working, at least through wine.
2020-11-10gofmtNick White
2020-11-10[rescribe] Enable custom paths to tesseract command to be set (also improve ↵Nick White
some error output)
2020-11-10[getpipelinebook] Rewrite to use internal package functionsNick White
2020-11-09Switch Preprocess() to take the thresholds to use, and have rescribe tool ↵separatelocalNick White
only use 0.1,0.2,0.3
2020-11-09[rescribe] work in progress at a self-contained local pipeline processor, ↵Nick White
called rescribe