blob: 5da6c6b9c147072310c849ba30b4431af532590c (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
|
# rescribe.xyz/bookpipeline package
This package contains various tools and functions for the OCR of
books, with a focus on distributed OCR using short-lived virtual
servers.
This is a Go package, and can be installed in the standard go way,
by running `go get rescribe.xyz/bookpipeline/...`
## Commands
The commands in the cmd/ directory are at the heart of this
package. For more details on their usage, use `go doc` or read
doc.go in the package repository.
The key commands for the virtual server side are:
- bookpipeline : processes items from queues, doing preprocessing,
ocr and postprocessing, and moving items on to
the next queue step on completion. this is the
core command of the package.
- booktopipeline : uploads a book to the pipeline and adds it to the
appropriate queue.
- getpipelinebook : downloads the pipeline results for a book.
- lspipeline : prints useful information about the status of the
pipeline.
- mkpipeline : sets up storage buckets and queues for use by the
pipeline.
- spotme : starts up a short-lived virtual server running
bookpipeline.
There are also some commands which are more useful in a standalone
setting:
- confgraph : creates a graph showing average word confidence of
each page of hOCR in a directory
- pagegraph : creates a graph showing average confidence of each
word in a page of hOCR
- pdfbook : creates a searchable PDF from a directory of hOCR
and image files
## Contributions
Any and all comments, bug reports, patches or pull requests would
be very welcomely received. Please email them to <nick@rescribe.xyz>.
## License
This package is licensed under the GPLv3. See the LICENSE file for
more details.
|