xyz.rescribe.rescribe Rescribe Rescribe Ltd OCR images, PDFs and Google Books

An easy-to-use desktop tool to recognise and extract text from images, PDFs and Google Books. It uses the Tesseract OCR engine, combined with modern and efficient preprocessing and analysis pipelines, to produce high quality output in plain text, hOCR and searchable PDF format. The tool has been built with a focus on OCR of historical printed works, but it includes modern language options and also works well on modern printed works.

Main window https://rescribe.xyz/rescribe/screenshot-03.png Dark mode https://rescribe.xyz/rescribe/screenshot-04.png OCR in progress https://rescribe.xyz/rescribe/screenshot-05.png OCR completed https://rescribe.xyz/rescribe/screenshot-06.png https://rescribe.xyz/rescribe MIT GPL-3.0 xyz.rescribe.rescribe.desktop #cdab8f #63452c

Fixed bug with directories containing files with spaces causing the process to fail, added concatenated text output named bookname.txt, fixed selecting a custom training in flatpak build, fixed getgbook on arm64 MacOS, improved layout of log area to fill all available space in the window, improved readability of log area text.

Improved PDF reading by adding support for embedded CCITT images. Improved PDF parsing to prevent a possible crash with bad PDF files. Improved error messages for unreadable PDFs. Improved GUI theme thanks to an update to Fyne.

Thanks to our fabulous Kickstarter backers, lots of improvements! Added GUI, added PDF extractor, added Google Book downloader, created a single binary for OSX for M1 and amd64, added file renamer so page files no longer need a particular naming format, added option to disable page wiping, added option to create full size PDF.