From 36c3400315b6be253590092f6863453d74e64b65 Mon Sep 17 00:00:00 2001 From: Nick White Date: Wed, 9 Oct 2019 20:03:48 +0100 Subject: Improve binarisation introduction wording, and add a couple more images --- content/posts/binarisation-introduction/index.md | 27 ++++++++++++++---------- 1 file changed, 16 insertions(+), 11 deletions(-) (limited to 'content/posts/binarisation-introduction/index.md') diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md index 085b8c5..041f6fa 100644 --- a/content/posts/binarisation-introduction/index.md +++ b/content/posts/binarisation-introduction/index.md @@ -30,7 +30,7 @@ threshold that is too low will result in too many pixels being marked as black, which for OCR means that various non-text noise will be included and considered by the OCR engine, again reducing accuracy. -( INSERT IMAGES DEMONSTRATING EACH ) +![Example of simple thresholding errors](example-02.png) If all page images were printed exactly the same way, and scanned the same way, we could probably get away with just picking an appropriate @@ -50,17 +50,22 @@ common foreground. Otsu's algorithm works well for well printed material, on good paper, which has been well scanned, as the brightness of the background and -foreground pixels is consistent. It works less well for pages which -been scanned with have uneven lighting, as the background brightness -may be quite different for one corner of a page than another. It is -also not too good at handling paper or ink inconsistencies, such as -blemishes, splotches or page grain, as they may well have parts -which are darker than the threshold. +foreground pixels is consistent. -( INSERT IMAGES DEMONSTRATING OTSU FAILING ON BAD LIGHTING AND WITH - SPLOTCHES IN PAGE BEING BLACKENED ) +However, even with the most perfectly chosen threshold number, there +are certain cases that no global threshold binarisation can do a good +job at. Pages which been scanned with have uneven lighting do badly, +as the background brightness may be quite different for one corner of +a page than another. Global threshold binarisation can also have +problems with paper or ink inconsistencies, such as blemishes, +splotches or page grain, as they may well have parts which are darker +than the global threshold. + +{{< figure src="example-03.png" caption="Example of an image that can't be satisfyingly binarised using any global threshold." >}} + + Both of these criticisms could be addressed by using an algorithm that could alter the threshold according to the conditions of the region on -the page. That will be covered in the next blog post, -[Adaptive Binarisation](/posts/adaptive-binarisation). +the page. That will be covered in the next blog post. -- cgit v1.2.1-24-ge1ad