From 36c3400315b6be253590092f6863453d74e64b65 Mon Sep 17 00:00:00 2001 From: Nick White Date: Wed, 9 Oct 2019 20:03:48 +0100 Subject: Improve binarisation introduction wording, and add a couple more images --- .../posts/binarisation-introduction/example-02.png | Bin 0 -> 800469 bytes .../posts/binarisation-introduction/example-03.png | Bin 0 -> 746016 bytes content/posts/binarisation-introduction/index.md | 27 ++++++++++++--------- 3 files changed, 16 insertions(+), 11 deletions(-) create mode 100644 content/posts/binarisation-introduction/example-02.png create mode 100644 content/posts/binarisation-introduction/example-03.png (limited to 'content') diff --git a/content/posts/binarisation-introduction/example-02.png b/content/posts/binarisation-introduction/example-02.png new file mode 100644 index 0000000..08abec0 Binary files /dev/null and b/content/posts/binarisation-introduction/example-02.png differ diff --git a/content/posts/binarisation-introduction/example-03.png b/content/posts/binarisation-introduction/example-03.png new file mode 100644 index 0000000..a12ca0e Binary files /dev/null and b/content/posts/binarisation-introduction/example-03.png differ diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md index 085b8c5..041f6fa 100644 --- a/content/posts/binarisation-introduction/index.md +++ b/content/posts/binarisation-introduction/index.md @@ -30,7 +30,7 @@ threshold that is too low will result in too many pixels being marked as black, which for OCR means that various non-text noise will be included and considered by the OCR engine, again reducing accuracy. -( INSERT IMAGES DEMONSTRATING EACH ) +![Example of simple thresholding errors](example-02.png) If all page images were printed exactly the same way, and scanned the same way, we could probably get away with just picking an appropriate @@ -50,17 +50,22 @@ common foreground. Otsu's algorithm works well for well printed material, on good paper, which has been well scanned, as the brightness of the background and -foreground pixels is consistent. It works less well for pages which -been scanned with have uneven lighting, as the background brightness -may be quite different for one corner of a page than another. It is -also not too good at handling paper or ink inconsistencies, such as -blemishes, splotches or page grain, as they may well have parts -which are darker than the threshold. +foreground pixels is consistent. -( INSERT IMAGES DEMONSTRATING OTSU FAILING ON BAD LIGHTING AND WITH - SPLOTCHES IN PAGE BEING BLACKENED ) +However, even with the most perfectly chosen threshold number, there +are certain cases that no global threshold binarisation can do a good +job at. Pages which been scanned with have uneven lighting do badly, +as the background brightness may be quite different for one corner of +a page than another. Global threshold binarisation can also have +problems with paper or ink inconsistencies, such as blemishes, +splotches or page grain, as they may well have parts which are darker +than the global threshold. + +{{< figure src="example-03.png" caption="Example of an image that can't be satisfyingly binarised using any global threshold." >}} + + Both of these criticisms could be addressed by using an algorithm that could alter the threshold according to the conditions of the region on -the page. That will be covered in the next blog post, -[Adaptive Binarisation](/posts/adaptive-binarisation). +the page. That will be covered in the next blog post. -- cgit v1.2.1-24-ge1ad