|author||Nick White <firstname.lastname@example.org>||2020-11-11 17:32:01 +0000|
|committer||Nick White <email@example.com>||2020-11-11 17:32:01 +0000|
Add desktop-tool draft
1 files changed, 65 insertions, 0 deletions
diff --git a/content/posts/desktop-tool/index.md b/content/posts/desktop-tool/index.md
new file mode 100644
@@ -0,0 +1,65 @@
+title: "Desktop Tool"
+categories: [software, code, tools]
+While [our pipeline](/posts/tool-overview) works well for OCR of
+a corpus efficiently using cloud servers, it was hard to get the
+features of the pipeline on your own computer. So we spent a bit of
+time recently creating a new tool which is designed to run self-
+contained on a desktop computer. We're calling the tool *rescribe*,
+because why not? At the moment it's a command line only tool.
+## Install and build
+*rescribe* is a part of our [bookpipeline](https://rescribe.xyz/bookpipeline)
+package, and is written in Go, so there are a few things you need
+to install to get it working.
+1. Firstly, you need to [download and install the Go tools](https://golang.org/dl/),
+which will be used to build and install *rescribe*.
+2. Next, you need to install the Tesseract OCR engine, which the
+tool uses for the core OCR step. If you're on Linux this should be
+available from your package manager,
+[follow these instructions if you're on a Mac](https://tesseract-ocr.github.io/tessdoc/Home.html#macos), or
+[download and run an installer from this page for Windows](https://github.com/UB-Mannheim/tesseract/wiki).
+3. Then you'll need to install [git](https://git-scm.com/downloads)
+if you don't already have it, so you can get the bookpipeline package.
+4. Download an OCR training set for the language you're interested in.
+We provide trainings for [Caroline Miniscule](https://manuscriptocr.org),
+[early printed Latin](https://latinocr.org) and
+Still here? Great. Now open up a terminal window. Don't worry, it
+will be worth it.
+1. Clone the latest version of the bookpipeline package:
+`git clone https://git.rescribe.xyz/bookpipeline`
+2. Change into the bookpipeline directory and build the rescribe tool:
+go build ./cmd/rescribe
+Now everything is ready for action, and there will be an executable
+inside the bookpipeline directory called *rescribe* (*rescribe.exe*
+You use *rescribe* by giving it the path of a training file to use
+and the directory containing the book or manuscript pages you want
+to OCR. Basic usage looks like this:
+rescribe -t ../trainings/carolinems.traineddata mybook
+This will run rescribe with a training at
+*../trainings/carolinems.traineddata* over all pages in the
+One limitation at the moment is that *rescribe* is very sensitive
+to how page images are named. It will only work on pages named
+`<anything>0001.png` or `<anything>0001.jpg`, where *`0001`* is any
+four digit number (and *`<anything>`* is anything!).