From 77d16fa1795b9387f70f7b2eac31faa03f7c30d0 Mon Sep 17 00:00:00 2001 From: Nick White Date: Tue, 22 Oct 2019 12:29:03 +0100 Subject: Improve clarity and wording of binarisation introduction, and add links to our git repo in who we are --- content/posts/binarisation-introduction/index.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'content/posts/binarisation-introduction/index.md') diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md index 5dfbabf..005f2ee 100644 --- a/content/posts/binarisation-introduction/index.md +++ b/content/posts/binarisation-introduction/index.md @@ -1,7 +1,6 @@ --- title: "An Introduction to Binarisation" -date: 2019-10-09 -draft: true +date: 2019-10-22 categories: [binarisation, preprocessing, image manipulation] --- Binarisation is the process of turning a colour or grayscale image into @@ -12,6 +11,13 @@ makes various image manipulation tasks much more straightforward. OCR is one such process, and all major OCR engines today work on binarised images. +Poor binarisation has been a key cause of poor OCR results for our work, +so we have spent some time looking into better solutions to improve our +results. We now generally pre-binarise page images before sending them +to an OCR engine, which has yielded significant quality improvements to +our OCR results. Before we get to that, it's worth looking in depth at +how different binarisation methods work. + Binarisation sounds pretty straightforward, and in the ideal case it is. You can pick a number, and go through each pixel in the image, checking if the pixel is lighter than the number, and if so declaring it to be -- cgit v1.2.1-24-ge1ad