README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

# rescribe.xyz/bookpipeline package

This package contains various tools and functions for the OCR of
books, with a focus on distributed OCR using short-lived virtual
servers.

This is a Go package, and can be installed in the standard go way,
by running `go get rescribe.xyz/bookpipeline/...`

## Commands

The commands in the cmd/ directory are at the heart of this
package. For more details on their usage, use `go doc` or read
doc.go in the package repository.

The key commands for the virtual server side are:

  - bookpipeline    : processes items from queues, doing preprocessing,
                      ocr and postprocessing, and moving items on to
                      the next queue step on completion. this is the
                      core command of the package.
  - booktopipeline  : uploads a book to the pipeline and adds it to the
                      appropriate queue.
  - getpipelinebook : downloads the pipeline results for a book.
  - lspipeline      : prints useful information about the status of the
                      pipeline.
  - mkpipeline      : sets up storage buckets and queues for use by the
                      pipeline.
  - spotme          : starts up a short-lived virtual server running
                      bookpipeline.

There are also some commands which are more useful in a standalone
setting:

  - confgraph : creates a graph showing average word confidence of
                each page of hOCR in a directory
  - pagegraph : creates a graph showing average confidence of each
                word in a page of hOCR
  - pdfbook   : creates a searchable PDF from a directory of hOCR
                and image files

## Contributions

Any and all comments, bug reports, patches or pull requests would
be very welcomely received. Please email them to <nick@rescribe.xyz>.

## License

This package is licensed under the GPLv3. See the LICENSE file for
more details.