diff options
author | Nick White <git@njw.name> | 2019-10-09 20:03:48 +0100 |
---|---|---|
committer | Nick White <git@njw.name> | 2019-10-09 20:03:48 +0100 |
commit | 36c3400315b6be253590092f6863453d74e64b65 (patch) | |
tree | c44e3b7940085a2813b8f9d29509191c80e09dea /content | |
parent | 6a2c722f784cf6dda502662e44ce15cd09e0fa14 (diff) |
Improve binarisation introduction wording, and add a couple more images
Diffstat (limited to 'content')
-rw-r--r-- | content/posts/binarisation-introduction/example-02.png | bin | 0 -> 800469 bytes | |||
-rw-r--r-- | content/posts/binarisation-introduction/example-03.png | bin | 0 -> 746016 bytes | |||
-rw-r--r-- | content/posts/binarisation-introduction/index.md | 27 |
3 files changed, 16 insertions, 11 deletions
diff --git a/content/posts/binarisation-introduction/example-02.png b/content/posts/binarisation-introduction/example-02.png Binary files differnew file mode 100644 index 0000000..08abec0 --- /dev/null +++ b/content/posts/binarisation-introduction/example-02.png diff --git a/content/posts/binarisation-introduction/example-03.png b/content/posts/binarisation-introduction/example-03.png Binary files differnew file mode 100644 index 0000000..a12ca0e --- /dev/null +++ b/content/posts/binarisation-introduction/example-03.png diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md index 085b8c5..041f6fa 100644 --- a/content/posts/binarisation-introduction/index.md +++ b/content/posts/binarisation-introduction/index.md @@ -30,7 +30,7 @@ threshold that is too low will result in too many pixels being marked as black, which for OCR means that various non-text noise will be included and considered by the OCR engine, again reducing accuracy. -( INSERT IMAGES DEMONSTRATING EACH ) +![Example of simple thresholding errors](example-02.png) If all page images were printed exactly the same way, and scanned the same way, we could probably get away with just picking an appropriate @@ -50,17 +50,22 @@ common foreground. Otsu's algorithm works well for well printed material, on good paper, which has been well scanned, as the brightness of the background and -foreground pixels is consistent. It works less well for pages which -been scanned with have uneven lighting, as the background brightness -may be quite different for one corner of a page than another. It is -also not too good at handling paper or ink inconsistencies, such as -blemishes, splotches or page grain, as they may well have parts -which are darker than the threshold. +foreground pixels is consistent. -( INSERT IMAGES DEMONSTRATING OTSU FAILING ON BAD LIGHTING AND WITH - SPLOTCHES IN PAGE BEING BLACKENED ) +However, even with the most perfectly chosen threshold number, there +are certain cases that no global threshold binarisation can do a good +job at. Pages which been scanned with have uneven lighting do badly, +as the background brightness may be quite different for one corner of +a page than another. Global threshold binarisation can also have +problems with paper or ink inconsistencies, such as blemishes, +splotches or page grain, as they may well have parts which are darker +than the global threshold. + +{{< figure src="example-03.png" caption="Example of an image that can't be satisfyingly binarised using any global threshold." >}} + +<!-- TODO: image demonstrating global thresholding failing on bad lighting --> Both of these criticisms could be addressed by using an algorithm that could alter the threshold according to the conditions of the region on -the page. That will be covered in the next blog post, -[Adaptive Binarisation](/posts/adaptive-binarisation). +the page. That will be covered in the next blog post<!--, +[Adaptive Binarisation](/posts/adaptive-binarisation)-->. |