Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-05-08 | Add scrape-erara.sh script (not fully tested) | Nick White | |
2019-05-08 | Set DPI for images, and maximally compress jpg (with binarisation it doesn't ↵ | Nick White | |
make much difference) | |||
2019-05-08 | Add format-for-hocr-pdf.sh script | Nick White | |
2019-04-23 | Save dehyphenated text to a different file, rather than overwriting the original | Nick White | |
2019-04-23 | Add dehyphenate script | Nick White | |
2019-04-09 | Modify traintessv4.sh to include step to construct final training | Nick White | |
2019-04-02 | Fix bugs in traintessv4.sh | Nick White | |
2019-04-02 | Add tesseractv4 training script | Nick White | |
2019-03-26 | Make book graph scripts more robust to dodgy page filenames, and name ↵ | Nick White | |
bookgraph better | |||
2019-03-26 | Add nonewlines script | Nick White | |
2019-03-11 | Add basic bsb scraper | Nick White | |
2019-02-25 | Make bookgraph script more readable | Nick White | |
2019-02-25 | Add various helper scripts | Nick White | |