diff options
Diffstat (limited to 'content/posts')
-rw-r--r-- | content/posts/binarisation-introduction/index.md | 10 |
1 files changed, 8 insertions, 2 deletions
diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md index 5dfbabf..005f2ee 100644 --- a/content/posts/binarisation-introduction/index.md +++ b/content/posts/binarisation-introduction/index.md @@ -1,7 +1,6 @@ --- title: "An Introduction to Binarisation" -date: 2019-10-09 -draft: true +date: 2019-10-22 categories: [binarisation, preprocessing, image manipulation] --- Binarisation is the process of turning a colour or grayscale image into @@ -12,6 +11,13 @@ makes various image manipulation tasks much more straightforward. OCR is one such process, and all major OCR engines today work on binarised images. +Poor binarisation has been a key cause of poor OCR results for our work, +so we have spent some time looking into better solutions to improve our +results. We now generally pre-binarise page images before sending them +to an OCR engine, which has yielded significant quality improvements to +our OCR results. Before we get to that, it's worth looking in depth at +how different binarisation methods work. + Binarisation sounds pretty straightforward, and in the ideal case it is. You can pick a number, and go through each pixel in the image, checking if the pixel is lighter than the number, and if so declaring it to be |