xyz.rescribe.rescribe Rescribe Rescribe Ltd

High quality OCR for images, PDFs and Google Books.

An easy-to-use desktop tool for OCR of images, PDFs and Google Books. It uses the Tesseract OCR engine, combined with modern and efficient preprocessing and analysis pipelines, to produce high quality output in plain text, hOCR and searchable PDF format. The tool has been built with a focus on OCR of historical printed works, but it includes modern language options and also works well on modern printed works.

https://rescribe.xyz/rescribe/screenshot-03.png https://rescribe.xyz/rescribe/screenshot-04.png https://rescribe.xyz/rescribe MIT GPL-3.0 xyz.rescribe.rescribe.desktop

Fixed bug with directories containing files with spaces causing the process to fail, added concatenated text output named bookname.txt, fixed selecting a custom training in flatpak build, fixed getgbook on arm64 MacOS, improved layout of log area to fill all available space in the window, improved readability of log area text.

Improved PDF reading by adding support for embedded CCITT images. Improved PDF parsing to prevent a possible crash with bad PDF files. Improved error messages for unreadable PDFs. Improved GUI theme thanks to an update to Fyne.

Thanks to our fabulous Kickstarter backers, lots of improvements! Added GUI, added PDF extractor, added Google Book downloader, created a single binary for OSX for M1 and amd64, added file renamer so page files no longer need a particular naming format, added option to disable page wiping, added option to create full size PDF.