summaryrefslogtreecommitdiff
path: root/content/posts
diff options
context:
space:
mode:
authorNick White <git@njw.name>2019-10-09 20:03:48 +0100
committerNick White <git@njw.name>2019-10-09 20:03:48 +0100
commit36c3400315b6be253590092f6863453d74e64b65 (patch)
treec44e3b7940085a2813b8f9d29509191c80e09dea /content/posts
parent6a2c722f784cf6dda502662e44ce15cd09e0fa14 (diff)
Improve binarisation introduction wording, and add a couple more images
Diffstat (limited to 'content/posts')
-rw-r--r--content/posts/binarisation-introduction/example-02.pngbin0 -> 800469 bytes
-rw-r--r--content/posts/binarisation-introduction/example-03.pngbin0 -> 746016 bytes
-rw-r--r--content/posts/binarisation-introduction/index.md27
3 files changed, 16 insertions, 11 deletions
diff --git a/content/posts/binarisation-introduction/example-02.png b/content/posts/binarisation-introduction/example-02.png
new file mode 100644
index 0000000..08abec0
--- /dev/null
+++ b/content/posts/binarisation-introduction/example-02.png
Binary files differ
diff --git a/content/posts/binarisation-introduction/example-03.png b/content/posts/binarisation-introduction/example-03.png
new file mode 100644
index 0000000..a12ca0e
--- /dev/null
+++ b/content/posts/binarisation-introduction/example-03.png
Binary files differ
diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md
index 085b8c5..041f6fa 100644
--- a/content/posts/binarisation-introduction/index.md
+++ b/content/posts/binarisation-introduction/index.md
@@ -30,7 +30,7 @@ threshold that is too low will result in too many pixels being marked as
black, which for OCR means that various non-text noise will be included
and considered by the OCR engine, again reducing accuracy.
-( INSERT IMAGES DEMONSTRATING EACH )
+![Example of simple thresholding errors](example-02.png)
If all page images were printed exactly the same way, and scanned the
same way, we could probably get away with just picking an appropriate
@@ -50,17 +50,22 @@ common foreground.
Otsu's algorithm works well for well printed material, on good paper,
which has been well scanned, as the brightness of the background and
-foreground pixels is consistent. It works less well for pages which
-been scanned with have uneven lighting, as the background brightness
-may be quite different for one corner of a page than another. It is
-also not too good at handling paper or ink inconsistencies, such as
-blemishes, splotches or page grain, as they may well have parts
-which are darker than the threshold.
+foreground pixels is consistent.
-( INSERT IMAGES DEMONSTRATING OTSU FAILING ON BAD LIGHTING AND WITH
- SPLOTCHES IN PAGE BEING BLACKENED )
+However, even with the most perfectly chosen threshold number, there
+are certain cases that no global threshold binarisation can do a good
+job at. Pages which been scanned with have uneven lighting do badly,
+as the background brightness may be quite different for one corner of
+a page than another. Global threshold binarisation can also have
+problems with paper or ink inconsistencies, such as blemishes,
+splotches or page grain, as they may well have parts which are darker
+than the global threshold.
+
+{{< figure src="example-03.png" caption="Example of an image that can't be satisfyingly binarised using any global threshold." >}}
+
+<!-- TODO: image demonstrating global thresholding failing on bad lighting -->
Both of these criticisms could be addressed by using an algorithm that
could alter the threshold according to the conditions of the region on
-the page. That will be covered in the next blog post,
-[Adaptive Binarisation](/posts/adaptive-binarisation).
+the page. That will be covered in the next blog post<!--,
+[Adaptive Binarisation](/posts/adaptive-binarisation)-->.