Age | Commit message (Collapse) | Author | |
---|---|---|---|
2021-02-01 | Ensure DeleteObjects can handle over 1000 files to delete; fixes rmbook for ↵ | Nick White | |
large books | |||
2021-01-26 | Make ListObjectsWithMeta generic again and create a specialised ↵ | Nick White | |
ListObjectWithMeta for single file listing, so we can still be as fast, but do not have a misleading api | |||
2021-01-26 | Improve lspipeline concurrency by removing WaitGroup stuff | Nick White | |
2021-01-26 | Speed up lspipeline by making s3 requests concurrently and only processing ↵ | Nick White | |
single results from ListObjects requests | |||
2021-01-26 | Stop limiting keys returned from listobjectprefixes' api usage; this speeds ↵ | Nick White | |
up the request markedly | |||
2020-12-15 | [rmbook] Append / to end of bookname, to ensure e.g. "1" doesnt match all ↵ | Nick White | |
books starting with "1" | |||
2020-12-15 | [rmbook] Add -dryrun flag | Nick White | |
2020-12-14 | Add rmbook tool | Nick White | |
2020-12-14 | Update preproc module used to incorporate an important crash fix | Nick White | |
2020-12-07 | [rescribe] Fix up *.hocr glob, which ensures that using a savedir that ↵v0.3.2 | Nick White | |
already has a hocr directory in it will work | |||
2020-12-07 | [rescribe] Allow saving of results to somewhere other than a directory named ↵ | Nick White | |
after the book being processed | |||
2020-12-04 | Ensure mkdir will succeed in upload | Nick White | |
2020-12-03 | [rescribe] Fix portability issue where hocrs may not be correctly moved and ↵ | Nick White | |
txt-ified on windows | |||
2020-12-03 | Don't upload binarised pdf twice needlessly | Nick White | |
This can also result in the file being uploaded twice simultaneously, as up() is running in a separate goroutine. This can cause failures on Windows as the file is attempted to be removed by one upload process while being open to upload by the other process. Probably it could also fail if the process completed by one (so the file was deleted) before being started by the other. | |||
2020-11-30 | Merge branch 'master' of ssh://hammerhead/home/nick/rescribe/src/bookpipeline | Nick White | |
2020-11-30 | Add getstats tool | Nick White | |
2020-11-24 | [booktopipeline] Add a check to disallow adding a book that already exists | Nick White | |
This is important as if a book is added which has already been done, then an analyse job will be added every time a page is OCRed, which will clog up the pipeline with unnecessary work. Also if a book was added with the same name but differently named files, or a different number of pages, the results would almost certainly not be as intended. In the case of a book really wanting to be added with a particular name, either the original directory can be removed on S3, or "v2" or similar can be appended to the book name before calling booktopipeline. | |||
2020-11-18 | Switch to a maintained version of gofpdf | Nick White | |
2020-11-18 | Describe rescribe tool in documentationv0.3.1 | Nick White | |
2020-11-17 | Add trimqueue and logwholequeue utilities which can help deal with weird ↵ | Nick White | |
queue states | |||
2020-11-17 | Remove _bin0.x from txt filenamesv0.3.0 | Nick White | |
2020-11-16 | Some changes to ensure the pipeline works correctly on Windows | Nick White | |
There were a couple of places where a file was uploaded while still open, which resulted in an attempt to remove it, which causes an error from Windows. The allOCRed function also included an assumption that the path separator would be a /, which is always correct for AWS, and correct for local on Linux and OSX, but not for local Windows. Fixed by leaving the separator well alone. Also, the local connection was not stripping leading \, like it did /, which caused an issue with Windows local. Windows local is now tested and working, at least through wine. | |||
2020-11-16 | [rescribe] Default to an appropriate tesscmd for Windows | Nick White | |
2020-11-16 | [rescribe] Add txt output, only keep colour pdf, and reorganise files so ↵ | Nick White | |
they're more user-friendly | |||
2020-11-16 | [rescribe] Mention in usage that things can be saved in a different directory | Nick White | |
2020-11-16 | Add makefile for generating cross compiled rescribe binaries | Nick White | |
2020-11-10 | gofmt | Nick White | |
2020-11-10 | [rescribe] Enable custom paths to tesseract command to be set (also improve ↵ | Nick White | |
some error output) | |||
2020-11-10 | [rescribe] Change -t to the path of the traineddata file, and set ↵ | Nick White | |
TESSDATA_PREFIX accordingly | |||
2020-11-10 | [rescribe] Handle errors in processbook correctly, and improve console output | Nick White | |
2020-11-10 | [getpipelinebook] Rewrite to use internal package functions | Nick White | |
2020-11-10 | Switch booktopipeline to use internal pipeline functions | Nick White | |
2020-11-09 | Add a couple of things that should not be forgotten | Nick White | |
2020-11-09 | Switch Preprocess() to take the thresholds to use, and have rescribe tool ↵separatelocal | Nick White | |
only use 0.1,0.2,0.3 | |||
2020-11-09 | [rescribe] Local only combo tool basically now working. Testing is still ↵ | Nick White | |
minimal. | |||
2020-11-09 | [rescribe] work in progress at a self-contained local pipeline processor, ↵ | Nick White | |
called rescribe | |||
2020-11-09 | [bookpipeline] Split most functionality out to package internal/pipeline | Nick White | |
No functionality changes, but this should make it easier to make custom builds using the pipeline in slightly different ways. | |||
2020-11-09 | Add -autostop, so time to shutdown can be specified, and so the process can ↵ | Nick White | |
just be stopped after a period, rather than the whole computer shut down | |||
2020-11-09 | [bookpipeline] Improve interface, particularly for local use, by disabling ↵ | Nick White | |
(failing) log saving, mail sending, and removing erroneous references to AWS | |||
2020-11-09 | Set hocr config options directly rather than relying on 'hocr' config file | Nick White | |
This ensures that bookpipeline will still work even if TESSDATA_PREFIX has been set to a directory without configs in it. | |||
2020-11-06 | Fix the README to be valid markdown in the local example | Nick White | |
2020-11-06 | Document the local mode | Nick White | |
2020-11-06 | Add git clone advice to readme | Nick White | |
2020-10-21 | Fix a bug that caused analyse step to not be triggered with local connection | Nick White | |
2020-10-20 | Improve logging by using Println, which ensures there is a space between ↵ | Nick White | |
arguments, even if all are strings | |||
2020-10-20 | Fix local queue deletion properly | Nick White | |
2020-10-20 | Hopefully fix off-by-one error causing errors with local bookpipeline | Nick White | |
2020-10-20 | Add postprocess-bythresh cmd | Nick White | |
2020-10-20 | Update spot image to use | Nick White | |
2020-09-22 | [booktopipeline] Check that all images are valid before adding to pipeline | Nick White | |