summaryrefslogtreecommitdiff
path: root/content/posts
diff options
context:
space:
mode:
authorNick White <git@njw.name>2019-10-22 12:29:03 +0100
committerNick White <git@njw.name>2019-10-22 12:29:03 +0100
commit77d16fa1795b9387f70f7b2eac31faa03f7c30d0 (patch)
tree3169201447c858dda78ee6bb4c9c2a7fc91415a8 /content/posts
parent1327179a1388e26fca276714f5f6f4bcc65785b6 (diff)
Improve clarity and wording of binarisation introduction, and add links to our git repo in who we are
Diffstat (limited to 'content/posts')
-rw-r--r--content/posts/binarisation-introduction/index.md10
1 files changed, 8 insertions, 2 deletions
diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md
index 5dfbabf..005f2ee 100644
--- a/content/posts/binarisation-introduction/index.md
+++ b/content/posts/binarisation-introduction/index.md
@@ -1,7 +1,6 @@
---
title: "An Introduction to Binarisation"
-date: 2019-10-09
-draft: true
+date: 2019-10-22
categories: [binarisation, preprocessing, image manipulation]
---
Binarisation is the process of turning a colour or grayscale image into
@@ -12,6 +11,13 @@ makes various image manipulation tasks much more straightforward. OCR is
one such process, and all major OCR engines today work on binarised
images.
+Poor binarisation has been a key cause of poor OCR results for our work,
+so we have spent some time looking into better solutions to improve our
+results. We now generally pre-binarise page images before sending them
+to an OCR engine, which has yielded significant quality improvements to
+our OCR results. Before we get to that, it's worth looking in depth at
+how different binarisation methods work.
+
Binarisation sounds pretty straightforward, and in the ideal case it is.
You can pick a number, and go through each pixel in the image, checking
if the pixel is lighter than the number, and if so declaring it to be