authorNick White <>2020-11-11 17:32:01 +0000
committerNick White <>2020-11-11 17:32:01 +0000
Add desktop-tool draft
+title: "Desktop Tool"
+date: 2020-11-11
+categories: [software, code, tools]
+While [our pipeline](/posts/tool-overview) works well for OCR of
+a corpus efficiently using cloud servers, it was hard to get the
+features of the pipeline on your own computer. So we spent a bit of
+time recently creating a new tool which is designed to run self-
+contained on a desktop computer. We're calling the tool *rescribe*,
+because why not? At the moment it's a command line only tool.
+## Install and build
+*rescribe* is a part of our [bookpipeline](
+package, and is written in Go, so there are a few things you need
+to install to get it working.
+1. Firstly, you need to [download and install the Go tools](,
+which will be used to build and install *rescribe*.
+2. Next, you need to install the Tesseract OCR engine, which the
+tool uses for the core OCR step. If you're on Linux this should be
+available from your package manager,
+[follow these instructions if you're on a Mac](, or
+[download and run an installer from this page for Windows](
+3. Then you'll need to install [git](
+if you don't already have it, so you can get the bookpipeline package.
+4. Download an OCR training set for the language you're interested in.
+We provide trainings for [Caroline Miniscule](,
+[early printed Latin]( and
+[Ancient Greek](
+Still here? Great. Now open up a terminal window. Don't worry, it
+will be worth it.
+1. Clone the latest version of the bookpipeline package:
+`git clone`
+2. Change into the bookpipeline directory and build the rescribe tool:
+cd bookpipeline
+go build ./cmd/rescribe
+Now everything is ready for action, and there will be an executable
+inside the bookpipeline directory called *rescribe* (*rescribe.exe*
+on Windows).
+## Usage
+You use *rescribe* by giving it the path of a training file to use
+and the directory containing the book or manuscript pages you want
+to OCR. Basic usage looks like this:
+rescribe -t ../trainings/carolinems.traineddata mybook
+This will run rescribe with a training at
+*../trainings/carolinems.traineddata* over all pages in the
+directory *mybook*.
+One limitation at the moment is that *rescribe* is very sensitive
+to how page images are named. It will only work on pages named
+`<anything>0001.png` or `<anything>0001.jpg`, where *`0001`* is any
+four digit number (and *`<anything>`* is anything!).