summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-07-27internal/pipeline: Add a test for CheckImagesNick White
2021-07-20Cleanup thanks to go vetNick White
2021-07-19internal/pipeline: Be more explicit with exactly what functions are in each ↵Nick White
interface, to ensure no "duplicate function" errors when compiling
2021-07-13Fix up tests a bitNick White
2021-07-13gofmtNick White
2021-07-13internal/pipeline: Reorganise interfaces so that functions only declare what ↵Nick White
they need We were using Pipeliner as a catch-all, but it's nicer if the functions can just state that e.g. they need download functionality, so decompose things so that that's how we do things now.
2021-07-13aws: Only look up test queue id when asked for, as for most Init()s it won't ↵Nick White
be needed
2021-07-12Add necessary pipeliner dependency for testqueue (probably remove this from ↵Nick White
internal library later as its only needed for tests
2021-07-12Add test for upAndQueue functionNick White
This involved adding a test queue, so it can be run safely without intefering with the pipeline.
2021-07-08rescribe: Exit with an error if directory doesn't existNick White
2021-06-29rescribe: add documentation on how to generate embedded dataNick White
2021-06-29rescribe: Add embed target for darwin (osx) tooNick White
2021-06-22rescribe: Remove erroneous unnecessary mkdirNick White
2021-06-22rescribe: Make it clearer that embedded training files are available to useNick White
2021-06-22rescribe: add embedded tesseract for linuxNick White
2021-06-22rescribe: allow use of embedded training even if -systess is usedNick White
2021-06-22cloud: update spot image to latest version that wont attempt to build ↵Nick White
rescribe tool
2021-06-22rescribe: Add go generate command to download the needed files to embedNick White
2021-06-22rescribe: Add an embedded tessdataNick White
2021-06-21Merge remote-tracking branch 'ssh/master'Nick White
2021-06-21rescribe: Set up so only Tesseract needed for the build platform is embeddedNick White
2021-06-21rescribe: Embed Tesseract into binary so that no Tesseract install is necessaryNick White
2021-06-21update spot image usedNick White
2021-06-15pipeline: Ignore hidden files when checking and uploadingNick White
This prevents issues if a .DS_Store file is present in a directory.
2021-05-31local: Only create a file once we are sure that it will be writeableNick White
2021-05-31Add a test for up(), and document download() and up() properlyNick White
2021-05-31Fix bug after changing pipeliner for tests, to ensure DeleteObjects is ↵Nick White
available to Pipeliner
2021-05-19Close process channel after writing to err channel in download(), in case of ↵Nick White
an error This is needed so that in tests the error can be selected out reliably, rather than an empty process signal.
2021-05-19Add tests for download()Nick White
2021-05-19Fix syntax with another Errorf callNick White
2021-05-19Local download now tries to open the source file before creating a ↵Nick White
destination file, so if it fails an empty file isnt left behind
2021-05-19Add basic DeleteObjects implementation to local.goNick White
2021-05-19Fix syntax for some fmt.Errorf callsNick White
2021-04-12Update preproc dependencyNick White
2021-03-16rescribe: change default training directory to trainings/v0.3.3Nick White
2021-02-22lspipeline: Rename to lspipeline-ng, and restore pre concurrency version to ↵Nick White
lspipeline as there are some hard to debug issues in concurrency version
2021-02-15getsamplepages: Add -prefix option, and use 'best' to get random page numbersNick White
The -prefix option is useful to us. Previously only a .jpg for page number 100 was retreived, which failed if the book had fewer (or unusually named) pages, and also didn't provide a corresponding .hocr at all (bug introduced with 48958d2). Using 'best', which is (effectively) randomly sorted, provides a guaranteed to exist page, and a random one at that.
2021-02-05Merge branch 'master' of ↵Nick White
ssh://ssh.phx.nearlyfreespeech.net/home/public/bookpipeline
2021-02-05Update go-chart dependencyNick White
2021-02-01Update AWS dependency to 1.37.1Nick White
2021-02-01Ensure DeleteObjects can handle over 1000 files to delete; fixes rmbook for ↵Nick White
large books
2021-01-26Make ListObjectsWithMeta generic again and create a specialised ↵Nick White
ListObjectWithMeta for single file listing, so we can still be as fast, but do not have a misleading api
2021-01-26Improve lspipeline concurrency by removing WaitGroup stuffNick White
2021-01-26Speed up lspipeline by making s3 requests concurrently and only processing ↵Nick White
single results from ListObjects requests
2021-01-26Stop limiting keys returned from listobjectprefixes' api usage; this speeds ↵Nick White
up the request markedly
2020-12-15[rmbook] Append / to end of bookname, to ensure e.g. "1" doesnt match all ↵Nick White
books starting with "1"
2020-12-15[rmbook] Add -dryrun flagNick White
2020-12-14Add rmbook toolNick White
2020-12-14Update preproc module used to incorporate an important crash fixNick White
2020-12-07[rescribe] Fix up *.hocr glob, which ensures that using a savedir that ↵v0.3.2Nick White
already has a hocr directory in it will work