Age | Commit message (Collapse) | Author |
|
|
|
folder
|
|
|
|
|
|
|
|
upload
|
|
|
|
There are several TODO items before this can be considered "good
enough", let alone complete. See the comments in the code for
details.
On a good day, with a fair wind, though, this works.
|
|
|
|
|
|
|
|
change)
|
|
|
|
|
|
|
|
|
|
output for training file not found, so that its clear that the file specified may not exist
|
|
|
|
EC2's rate limiting
|
|
makefile to where it makes sense
|
|
|
|
|
|
|
|
|
|
|
|
internal library later as its only needed for tests
|
|
This involved adding a test queue, so it can be run safely without
intefering with the pipeline.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
available to Pipeliner
|
|
|
|
lspipeline as there are some hard to debug issues in concurrency version
|
|
The -prefix option is useful to us.
Previously only a .jpg for page number 100 was retreived, which
failed if the book had fewer (or unusually named) pages, and also
didn't provide a corresponding .hocr at all (bug introduced with
48958d2). Using 'best', which is (effectively) randomly sorted,
provides a guaranteed to exist page, and a random one at that.
|
|
ListObjectWithMeta for single file listing, so we can still be as fast, but do not have a misleading api
|
|
|
|
single results from ListObjects requests
|
|
books starting with "1"
|
|
|
|
|
|
already has a hocr directory in it will work
|
|
after the book being processed
|