| Age | Commit message (Collapse) | Author | 
|---|
|  | internal library later as its only needed for tests | 
|  | This involved adding a test queue, so it can be run safely without
intefering with the pipeline. | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  | available to Pipeliner | 
|  |  | 
|  | lspipeline as there are some hard to debug issues in concurrency version | 
|  | The -prefix option is useful to us.
Previously only a .jpg for page number 100 was retreived, which
failed if the book had fewer (or unusually named) pages, and also
didn't provide a corresponding .hocr at all (bug introduced with
48958d2). Using 'best', which is (effectively) randomly sorted,
provides a guaranteed to exist page, and a random one at that. | 
|  | ListObjectWithMeta for single file listing, so we can still be as fast, but do not have a misleading api | 
|  |  | 
|  | single results from ListObjects requests | 
|  | books starting with "1" | 
|  |  | 
|  |  | 
|  | already has a hocr directory in it will work | 
|  | after the book being processed | 
|  | txt-ified on windows | 
|  |  | 
|  |  | 
|  | This is important as if a book is added which has already been done,
then an analyse job will be added every time a page is OCRed, which
will clog up the pipeline with unnecessary work. Also if a book was
added with the same name but differently named files, or a different
number of pages, the results would almost certainly not be as
intended.
In the case of a book really wanting to be added with a particular
name, either the original directory can be removed on S3, or "v2"
or similar can be appended to the book name before calling
booktopipeline. | 
|  | queue states | 
|  |  | 
|  |  | 
|  | they're more user-friendly | 
|  |  | 
|  |  | 
|  | some error output) | 
|  | TESSDATA_PREFIX accordingly | 
|  |  | 
|  |  | 
|  |  | 
|  |  | 
|  | only use 0.1,0.2,0.3 | 
|  | minimal. | 
|  | called rescribe | 
|  | No functionality changes, but this should make it easier to make custom
builds using the pipeline in slightly different ways. | 
|  | just be stopped after a period, rather than the whole computer shut down | 
|  | (failing) log saving, mail sending, and removing erroneous references to AWS | 
|  | This ensures that bookpipeline will still work even if TESSDATA_PREFIX has
been set to a directory without configs in it. | 
|  | arguments, even if all are strings | 
|  |  |