Age | Commit message (Collapse) | Author |
|
allOCRed was checking for wipePattern files, however they should have
been transformed into the regular preprocessedPattern for OCR anyway,
so shouldn't have been directly OCRed. Thus, allOCRed was mistakenly
looking for .hocr versions of the original wipePattern files, which
never would have been produced.
|
|
|
|
parsed by graph, and add line or word option
|
|
|
|
|
|
|
|
|
|
|
|
|
|
to any queue
|
|
output
|
|
|
|
|
|
|
|
|
|
|
|
|
|
getpipelinebook
|
|
|
|
should have been handled by caller
|
|
be done and move on; not all books will have all page types (such as wipeonly books)
|
|
|
|
|
|
|
|
of the aws CommonPrefixes output
|
|
of time when there are large numbers of pages
|
|
|
|
|
|
|
|
|
|
be generated
|
|
This now concludes the OCR Page queue stuff; it should all be working
out of the box now.
|
|
This should be a good way to get around the ongoing heartbeat
issue, as individual page jobs will never come close to a the
12 hour mark that can cause the bug.
The OCR page processing is done and working now, still to do
is to populate the queue (rather than the ocr queue) after
preprocessing / wiping.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
will restart it soon thereafter)
|
|
This approach first sets the remaining visibility timeout to zero.
This should ensure that the message is available to re-find as soon
as the process looks for it.
Correspondingly the delay between checks is much shorter, as there
shouldn't be a reason for much delay.
|
|
option to download the original page images too
|
|
|
|
|
|
|