diff options
author | Nick White <git@njw.name> | 2019-10-22 12:29:03 +0100 |
---|---|---|
committer | Nick White <git@njw.name> | 2019-10-22 12:29:03 +0100 |
commit | 77d16fa1795b9387f70f7b2eac31faa03f7c30d0 (patch) | |
tree | 3169201447c858dda78ee6bb4c9c2a7fc91415a8 /content/posts | |
parent | 1327179a1388e26fca276714f5f6f4bcc65785b6 (diff) |
Improve clarity and wording of binarisation introduction, and add links to our git repo in who we are
Diffstat (limited to 'content/posts')
-rw-r--r-- | content/posts/binarisation-introduction/index.md | 10 |
1 files changed, 8 insertions, 2 deletions
diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md index 5dfbabf..005f2ee 100644 --- a/content/posts/binarisation-introduction/index.md +++ b/content/posts/binarisation-introduction/index.md @@ -1,7 +1,6 @@ --- title: "An Introduction to Binarisation" -date: 2019-10-09 -draft: true +date: 2019-10-22 categories: [binarisation, preprocessing, image manipulation] --- Binarisation is the process of turning a colour or grayscale image into @@ -12,6 +11,13 @@ makes various image manipulation tasks much more straightforward. OCR is one such process, and all major OCR engines today work on binarised images. +Poor binarisation has been a key cause of poor OCR results for our work, +so we have spent some time looking into better solutions to improve our +results. We now generally pre-binarise page images before sending them +to an OCR engine, which has yielded significant quality improvements to +our OCR results. Before we get to that, it's worth looking in depth at +how different binarisation methods work. + Binarisation sounds pretty straightforward, and in the ideal case it is. You can pick a number, and go through each pixel in the image, checking if the pixel is lighter than the number, and if so declaring it to be |