summaryrefslogtreecommitdiff
path: root/README
blob: 2f1a95a4e325c1af04c84b880129b6196ce84bdb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# rescribe.xyz/bookpipeline package

This package contains various tools and functions for the OCR of
books, with a focus on distributed OCR using short-lived virtual
servers.

This is a Go package, and can be installed in the standard go way,
by running `go get rescribe.xyz/bookpipeline/...` and documentation
can be read with the `go doc` command or online at
<https://pkg.go.dev/rescribe.xyz/bookpipeline>.

If you just want to install and use the commands, you can get the
package with `git clone https://git.rescribe.xyz/bookpipeline`, and
then install them with `go install ./...` from within the
`bookpipeline` directory.

## Commands

The commands in the cmd/ directory are at the heart of this
package. For more details on their usage, use `go doc` or read
doc.go in the package repository.

The key commands for the virtual server side are:

  - bookpipeline    : processes items from queues, doing preprocessing,
                      ocr and postprocessing, and moving items on to
                      the next queue step on completion. this is the
                      core command of the package.
  - booktopipeline  : uploads a book to the pipeline and adds it to the
                      appropriate queue.
  - getpipelinebook : downloads the pipeline results for a book.
  - lspipeline      : prints useful information about the status of the
                      pipeline.
  - mkpipeline      : sets up storage buckets and queues for use by the
                      pipeline.
  - spotme          : starts up a short-lived virtual server running
                      bookpipeline.

There are also some commands which are more useful in a standalone
setting:

  - confgraph : creates a graph showing average word confidence of
                each page of hOCR in a directory
  - pagegraph : creates a graph showing average confidence of each
                word in a page of hOCR
  - pdfbook   : creates a searchable PDF from a directory of hOCR
                and image files

## Contributions

Any and all comments, bug reports, patches or pull requests would
be very welcomely received. Please email them to <nick@rescribe.xyz>.

## License

This package is licensed under the GPLv3. See the LICENSE file for
more details.