Age | Commit message (Collapse) | Author | |
---|---|---|---|
2024-08-22 | Don't skip binarisation for folders full of images, as that's nobody's ↵ | Nick White | |
workflow anymore | |||
2024-01-28 | Fix issue with directory page images with spaces in the name causing ↵guirefactor | Nick White | |
processing errors Any page name with a space in (like "page 01.jpg") would cause all book processing to fail, because the queue can't handle file names with spaces in. Fix that by replacing any spaces with underscores in the temporary pipeline files. | |||
2022-03-21 | Only generate full-size PDF if requested | Nick White | |
This avoids the issue that large PDFs require a lot of RAM, so there are chances of running out of memory. Plus it's a waste of space and time. | |||
2022-03-11 | Separate out fullsize pdf creation from colour pdf creation, so less memory ↵fullsizepdf | Nick White | |
is needed | |||
2022-03-11 | Add initial support for full-size PDF generation | Nick White | |
Some issues: 1) The PDF generation stores every page in memory while it constructs it. That means that there's a higher chance of failure due to running out of memory with these. There's no getting around this except by improving the PDF generation library, which is not easy. 2) Currently I've just changed the pipeline to always generate these full size PDFs, and then the rescribe tool will just delete them if they weren't requested. This is bad in particular because of point 1, and would probably cause issues of failures in the server pipeline as a result Therefore the plan is to add a tag to queue messages so that full size generation can be selectively enabled. Also, it should be split from the loop with colour pdf generation, as holding them both in RAM at the same time is unnecessary. | |||
2022-02-28 | adjusted file renaming to make suffixes of png and jpg files lowercase and ↵ | Antonia Rescribe | |
change jpeg to jpg | |||
2022-02-28 | Add PreNoWipe queue, that just does binarisation but no wiping | Nick White | |
2022-02-21 | Ensure that no new console windows are opened on Windows when executing ↵ | Nick White | |
Tesseract | |||
2022-01-31 | pipeline: Fail if no images are present | Nick White | |
2022-01-31 | Make pipeline context-aware, so the rescribe tool can cancel jobs | Nick White | |
2022-01-17 | internal/pipeline: if a graph cannot be created, don't leave an empty ↵ | Nick White | |
graph.png file, and allow failure to download that as it won't be created in the case of a 1 page book, which is fine | |||
2022-01-10 | internal/pipeline: Have DownloadPdfs() try to download all PDFs, but only ↵ | Nick White | |
return an error if none downloaded, as there are times when the colour PDF will not exist, which is fine | |||
2021-12-20 | whitespace and error clarity changes | Nick White | |
2021-12-20 | fixed -png flag and changed rescribe tool to save binarized png in separate ↵ | Antonia Rescribe | |
folder | |||
2021-12-06 | pipeline: ignore any files with a non-image suffix, rather than erroring on them | Nick White | |
2021-11-23 | gofmt, plus update documentation of recently changed pipeline.UploadImages | Nick White | |
2021-11-22 | internal/pipeline: remove old and broken requirement for TestStorageId() | Nick White | |
2021-11-22 | changed put.go so that a 4-digit number is appended to the end of each ↵ | Antonia Rescribe | |
filename when images are uploaded to the pipeline | |||
2021-08-17 | pipeline: use regular storage for tests, rather than a separate one | Nick White | |
2021-08-02 | internal/pipeline: Add test (incomplete but working) for UploadImages | Nick White | |
2021-07-27 | internal/pipeline: Add test to check that hidden files are skipped | Nick White | |
2021-07-27 | internal/pipeline: add tests for DetectQueueType | Nick White | |
2021-07-27 | internal/pipeline: Add notreadable test to CheckImages | Nick White | |
2021-07-27 | internal/pipeline: Add a test for CheckImages | Nick White | |
2021-07-19 | internal/pipeline: Be more explicit with exactly what functions are in each ↵ | Nick White | |
interface, to ensure no "duplicate function" errors when compiling | |||
2021-07-13 | Fix up tests a bit | Nick White | |
2021-07-13 | gofmt | Nick White | |
2021-07-13 | internal/pipeline: Reorganise interfaces so that functions only declare what ↵ | Nick White | |
they need We were using Pipeliner as a catch-all, but it's nicer if the functions can just state that e.g. they need download functionality, so decompose things so that that's how we do things now. | |||
2021-07-12 | Add test for upAndQueue function | Nick White | |
This involved adding a test queue, so it can be run safely without intefering with the pipeline. | |||
2021-06-15 | pipeline: Ignore hidden files when checking and uploading | Nick White | |
This prevents issues if a .DS_Store file is present in a directory. | |||
2021-05-31 | Add a test for up(), and document download() and up() properly | Nick White | |
2021-05-31 | Fix bug after changing pipeliner for tests, to ensure DeleteObjects is ↵ | Nick White | |
available to Pipeliner | |||
2021-05-19 | Close process channel after writing to err channel in download(), in case of ↵ | Nick White | |
an error This is needed so that in tests the error can be selected out reliably, rather than an empty process signal. | |||
2021-05-19 | Add tests for download() | Nick White | |
2021-05-19 | Fix syntax with another Errorf call | Nick White | |
2021-05-19 | Fix syntax for some fmt.Errorf calls | Nick White | |
2020-12-07 | [rescribe] Allow saving of results to somewhere other than a directory named ↵ | Nick White | |
after the book being processed | |||
2020-12-03 | Don't upload binarised pdf twice needlessly | Nick White | |
This can also result in the file being uploaded twice simultaneously, as up() is running in a separate goroutine. This can cause failures on Windows as the file is attempted to be removed by one upload process while being open to upload by the other process. Probably it could also fail if the process completed by one (so the file was deleted) before being started by the other. | |||
2020-11-16 | Some changes to ensure the pipeline works correctly on Windows | Nick White | |
There were a couple of places where a file was uploaded while still open, which resulted in an attempt to remove it, which causes an error from Windows. The allOCRed function also included an assumption that the path separator would be a /, which is always correct for AWS, and correct for local on Linux and OSX, but not for local Windows. Fixed by leaving the separator well alone. Also, the local connection was not stripping leading \, like it did /, which caused an issue with Windows local. Windows local is now tested and working, at least through wine. | |||
2020-11-10 | gofmt | Nick White | |
2020-11-10 | [rescribe] Enable custom paths to tesseract command to be set (also improve ↵ | Nick White | |
some error output) | |||
2020-11-10 | [getpipelinebook] Rewrite to use internal package functions | Nick White | |
2020-11-09 | Switch Preprocess() to take the thresholds to use, and have rescribe tool ↵separatelocal | Nick White | |
only use 0.1,0.2,0.3 | |||
2020-11-09 | [rescribe] Local only combo tool basically now working. Testing is still ↵ | Nick White | |
minimal. | |||
2020-11-09 | [rescribe] work in progress at a self-contained local pipeline processor, ↵ | Nick White | |
called rescribe | |||
2020-11-09 | [bookpipeline] Split most functionality out to package internal/pipeline | Nick White | |
No functionality changes, but this should make it easier to make custom builds using the pipeline in slightly different ways. |