Improve desktop-tool page

author: Nick White <git@njw.name> 2020-11-17 14:28:54 +0000
committer: Nick White <git@njw.name> 2020-11-17 14:28:54 +0000
commit: 852edcd1c8ff81e8701ebd97dd48c18afa741f08 (patch)
tree: 9be3c76802ba5b1ba4867578bbba3dee444591a6 /content/posts
parent: 70959ff850ebca6e38c947d03b86efc94f8a8dd4 (diff)
1 files changed, 43 insertions, 29 deletions
diff --git a/content/posts/desktop-tool/index.md b/content/posts/desktop-tool/index.md
index 66c6b9f..21deb69 100644
--- a/content/posts/desktop-tool/index.md
+++ b/content/posts/desktop-tool/index.md
@@ -10,56 +10,70 @@ time recently creating a new tool which is designed to run self-
 contained on a desktop computer. We're calling the tool *rescribe*,
 because why not? At the moment it's a command line only tool.
 
-## Install and build
+## Install dependencies
 
 *rescribe* is a part of our [bookpipeline](https://rescribe.xyz/bookpipeline)
-package, and is written in Go, so there are a few things you need
-to install to get it working.
+package, and we provide pre built executables for it which can be
+downloaded for each platform here:
 
-1. Firstly, you need to [download and install the Go tools](https://golang.org/dl/),
-which will be used to build and install *rescribe*.
-2. Next, you need to install the Tesseract OCR engine, which the
+* [Linux](https://rescribe.xyz/rescribe/0.3.0/rescribe)
+* [OS X](https://rescribe.xyz/rescribe/0.3.0/osx/rescribe)
+* [Windows](https://rescribe.xyz/rescribe/0.3.0/rescribe.exe)
+
+Note that if you're on Linux or OS X you will probably need to run
+`chmod +x rescribe` after downloading, to make it executable.
+
+Next, you need to install the Tesseract OCR engine, which the
 tool uses for the core OCR step. If you're on Linux this should be
 available from your package manager,
 [follow these instructions if you're on a Mac](https://tesseract-ocr.github.io/tessdoc/Home.html#macos), or 
 [download and run an installer from this page for Windows](https://github.com/UB-Mannheim/tesseract/wiki).
-3. Then you'll need to install [git](https://git-scm.com/downloads)
-if you don't already have it, so you can get the bookpipeline package.
-4. Download an OCR training set for the language you're interested in.
-We provide trainings for [Caroline Miniscule](https://manuscriptocr.org),
-[early printed Latin](https://latinocr.org) and
-[Ancient Greek](https://ancientgreekocr.org).
-
-Still here? Great. Now open up a terminal window. Don't worry, it
-will be worth it.
-
-1. Clone the latest version of the bookpipeline package:
-`git clone https://git.rescribe.xyz/bookpipeline`
-2. Change into the bookpipeline directory and build the rescribe tool:
-```
-cd bookpipeline
-go build ./cmd/rescribe
-```
 
-Now everything is ready for action, and there will be an executable
-inside the bookpipeline directory called *rescribe* (*rescribe.exe*
-on Windows).
+Finally, you will need to download an OCR training set for the
+language / script you're interested in. We provide trainings for
+[Caroline Miniscule](https://manuscriptocr.org),
+[early printed Latin](https://latinocr.org) and
+[Ancient Greek](https://ancientgreekocr.org). Any other Tesseract
+OCR training set will also work fine.
 
 ## Usage
 
+Still here? Great. Now open up a terminal window. Don't worry, it
+will be worth it. If you're on Windows, you can type cmd.exe into
+the run box, on OSX it's under Applications -> Utilities -> Terminal,
+and if you're on Linux I bet you already know where to find your
+terminal.
+
 You use *rescribe* by giving it the path of a training file to use
 and the directory containing the book or manuscript pages you want
 to OCR. Basic usage looks like this:
 ```
-rescribe -t ../trainings/carolinems.traineddata mybook
+./rescribe -t ../trainings/carolinems.traineddata mybook
 ```
 This will run rescribe with a training at
 *../trainings/carolinems.traineddata* over all pages in the
-directory *mybook*.
+directory *mybook*. A successful run will add several new files to
+*mybook*:
+
+* A PDF file named after the directory (`mybook.pdf` in the above
+  example), which is fully searchable.
+* A `text` directory, containing plain text versions of the OCR
+  results for each page.
+* A `hocr` directory, containing hOCR formatted OCR results for each
+  page.
+* A `graph.png` file, which shows the OCR confidence of each page (a
+  rough indicator of the quality of the OCR over the book).
+* A `conf` file, which lists the OCR confidence of each page, at each
+  preprocessing [binarisation threshold](/posts/adaptive-binarisation)
+  attempted.
+
+## Limitations
 
 One limitation at the moment is that *rescribe* is very sensitive
 to how page images are named. It will only work on pages named
 `<anything>0001.png` or `<anything>0001.jpg`, where *`0001`* is any
 four digit number (and *`<anything>`* is anything!).
 
-
+There are likely to be bugs! [Let us know](mailto:info@rescribe.xyz)
+of any issues you have, any features you'd like, or just that you're
+enjoying using it!
author	Nick White <git@njw.name>	2020-11-17 14:28:54 +0000
committer	Nick White <git@njw.name>	2020-11-17 14:28:54 +0000
commit	852edcd1c8ff81e8701ebd97dd48c18afa741f08 (patch)
tree	9be3c76802ba5b1ba4867578bbba3dee444591a6 /content/posts
parent	70959ff850ebca6e38c947d03b86efc94f8a8dd4 (diff)