summaryrefslogtreecommitdiff
path: root/content
diff options
context:
space:
mode:
authorNick White <git@njw.name>2019-10-22 12:29:03 +0100
committerNick White <git@njw.name>2019-10-22 12:29:03 +0100
commit77d16fa1795b9387f70f7b2eac31faa03f7c30d0 (patch)
tree3169201447c858dda78ee6bb4c9c2a7fc91415a8 /content
parent1327179a1388e26fca276714f5f6f4bcc65785b6 (diff)
Improve clarity and wording of binarisation introduction, and add links to our git repo in who we are
Diffstat (limited to 'content')
-rw-r--r--content/posts/binarisation-introduction/index.md10
-rw-r--r--content/who-we-are.md3
2 files changed, 9 insertions, 4 deletions
diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md
index 5dfbabf..005f2ee 100644
--- a/content/posts/binarisation-introduction/index.md
+++ b/content/posts/binarisation-introduction/index.md
@@ -1,7 +1,6 @@
---
title: "An Introduction to Binarisation"
-date: 2019-10-09
-draft: true
+date: 2019-10-22
categories: [binarisation, preprocessing, image manipulation]
---
Binarisation is the process of turning a colour or grayscale image into
@@ -12,6 +11,13 @@ makes various image manipulation tasks much more straightforward. OCR is
one such process, and all major OCR engines today work on binarised
images.
+Poor binarisation has been a key cause of poor OCR results for our work,
+so we have spent some time looking into better solutions to improve our
+results. We now generally pre-binarise page images before sending them
+to an OCR engine, which has yielded significant quality improvements to
+our OCR results. Before we get to that, it's worth looking in depth at
+how different binarisation methods work.
+
Binarisation sounds pretty straightforward, and in the ideal case it is.
You can pick a number, and go through each pixel in the image, checking
if the pixel is lighter than the number, and if so declaring it to be
diff --git a/content/who-we-are.md b/content/who-we-are.md
index 2c7032d..5f237ab 100644
--- a/content/who-we-are.md
+++ b/content/who-we-are.md
@@ -2,8 +2,7 @@
title: "Who we are"
date: 2019-02-11
menu: "main"
-draft: true
---
-Rescribe is a not-for-profit company focused on improving the state of OCR and related technologies for historical books and documents. Free and open source software is key to the work we do, and we release all the code and training data we create and use on [github](https://github.com/rescribe).
+Rescribe is a not-for-profit company focused on improving the state of OCR and related technologies for historical books and documents. Free and open source software is key to the work we do, and we release all the code and training data we create and use on [our git server](https://git.rescribe.xyz) (also mirrored on [github](https://github.com/rescribe)).
We work with a variety of academic and archival projects to make historical works more accessible, searchable and discoverable, and to enable researchers to work with them and find new connections.