| Age | Commit message (Collapse) | Author | 
 | 
change)
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
output for training file not found, so that its clear that the file specified may not exist
 | 
 | 
 | 
 | 
EC2's rate limiting
 | 
 | 
makefile to where it makes sense
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
internal library later as its only needed for tests
 | 
 | 
This involved adding a test queue, so it can be run safely without
intefering with the pipeline.
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
available to Pipeliner
 | 
 | 
 | 
 | 
lspipeline as there are some hard to debug issues in concurrency version
 | 
 | 
The -prefix option is useful to us.
Previously only a .jpg for page number 100 was retreived, which
failed if the book had fewer (or unusually named) pages, and also
didn't provide a corresponding .hocr at all (bug introduced with
48958d2). Using 'best', which is (effectively) randomly sorted,
provides a guaranteed to exist page, and a random one at that.
 | 
 | 
ListObjectWithMeta for single file listing, so we can still be as fast, but do not have a misleading api
 | 
 | 
 | 
 | 
single results from ListObjects requests
 | 
 | 
books starting with "1"
 | 
 | 
 | 
 | 
 | 
 | 
already has a hocr directory in it will work
 | 
 | 
after the book being processed
 | 
 | 
txt-ified on windows
 | 
 | 
 | 
 | 
 | 
 | 
This is important as if a book is added which has already been done,
then an analyse job will be added every time a page is OCRed, which
will clog up the pipeline with unnecessary work. Also if a book was
added with the same name but differently named files, or a different
number of pages, the results would almost certainly not be as
intended.
In the case of a book really wanting to be added with a particular
name, either the original directory can be removed on S3, or "v2"
or similar can be appended to the book name before calling
booktopipeline.
 | 
 | 
queue states
 | 
 | 
 | 
 | 
 | 
 | 
they're more user-friendly
 | 
 | 
 | 
 | 
 | 
 | 
some error output)
 |