diff options
-rw-r--r-- | content/posts/binarisation-introduction/index.md | 10 | ||||
-rw-r--r-- | content/who-we-are.md | 3 |
2 files changed, 9 insertions, 4 deletions
diff --git a/content/posts/binarisation-introduction/index.md b/content/posts/binarisation-introduction/index.md index 5dfbabf..005f2ee 100644 --- a/content/posts/binarisation-introduction/index.md +++ b/content/posts/binarisation-introduction/index.md @@ -1,7 +1,6 @@ --- title: "An Introduction to Binarisation" -date: 2019-10-09 -draft: true +date: 2019-10-22 categories: [binarisation, preprocessing, image manipulation] --- Binarisation is the process of turning a colour or grayscale image into @@ -12,6 +11,13 @@ makes various image manipulation tasks much more straightforward. OCR is one such process, and all major OCR engines today work on binarised images. +Poor binarisation has been a key cause of poor OCR results for our work, +so we have spent some time looking into better solutions to improve our +results. We now generally pre-binarise page images before sending them +to an OCR engine, which has yielded significant quality improvements to +our OCR results. Before we get to that, it's worth looking in depth at +how different binarisation methods work. + Binarisation sounds pretty straightforward, and in the ideal case it is. You can pick a number, and go through each pixel in the image, checking if the pixel is lighter than the number, and if so declaring it to be diff --git a/content/who-we-are.md b/content/who-we-are.md index 2c7032d..5f237ab 100644 --- a/content/who-we-are.md +++ b/content/who-we-are.md @@ -2,8 +2,7 @@ title: "Who we are" date: 2019-02-11 menu: "main" -draft: true --- -Rescribe is a not-for-profit company focused on improving the state of OCR and related technologies for historical books and documents. Free and open source software is key to the work we do, and we release all the code and training data we create and use on [github](https://github.com/rescribe). +Rescribe is a not-for-profit company focused on improving the state of OCR and related technologies for historical books and documents. Free and open source software is key to the work we do, and we release all the code and training data we create and use on [our git server](https://git.rescribe.xyz) (also mirrored on [github](https://github.com/rescribe)). We work with a variety of academic and archival projects to make historical works more accessible, searchable and discoverable, and to enable researchers to work with them and find new connections. |