--- title: "Desktop Tool" date: 2020-11-11 categories: [software, code, tools] --- While [our pipeline](/posts/tool-overview) works well for OCR of a corpus efficiently using cloud servers, it was hard to get the features of the pipeline on your own computer. So we spent a bit of time recently creating a new tool which is designed to run self- contained on a desktop computer. We're calling the tool *rescribe*, because why not? At the moment it's a command line only tool. ## Install and build *rescribe* is a part of our [bookpipeline](https://rescribe.xyz/bookpipeline) package, and is written in Go, so there are a few things you need to install to get it working. 1. Firstly, you need to [download and install the Go tools](https://golang.org/dl/), which will be used to build and install *rescribe*. 2. Next, you need to install the Tesseract OCR engine, which the tool uses for the core OCR step. If you're on Linux this should be available from your package manager, [follow these instructions if you're on a Mac](https://tesseract-ocr.github.io/tessdoc/Home.html#macos), or [download and run an installer from this page for Windows](https://github.com/UB-Mannheim/tesseract/wiki). 3. Then you'll need to install [git](https://git-scm.com/downloads) if you don't already have it, so you can get the bookpipeline package. 4. Download an OCR training set for the language you're interested in. We provide trainings for [Caroline Miniscule](https://manuscriptocr.org), [early printed Latin](https://latinocr.org) and [Ancient Greek](https://ancientgreekocr.org). Still here? Great. Now open up a terminal window. Don't worry, it will be worth it. 1. Clone the latest version of the bookpipeline package: `git clone https://git.rescribe.xyz/bookpipeline` 2. Change into the bookpipeline directory and build the rescribe tool: ``` cd bookpipeline go build ./cmd/rescribe ``` Now everything is ready for action, and there will be an executable inside the bookpipeline directory called *rescribe* (*rescribe.exe* on Windows). ## Usage You use *rescribe* by giving it the path of a training file to use and the directory containing the book or manuscript pages you want to OCR. Basic usage looks like this: ``` rescribe -t ../trainings/carolinems.traineddata mybook ``` This will run rescribe with a training at *../trainings/carolinems.traineddata* over all pages in the directory *mybook*. One limitation at the moment is that *rescribe* is very sensitive to how page images are named. It will only work on pages named `0001.png` or `0001.jpg`, where *`0001`* is any four digit number (and *``* is anything!).