Age | Commit message (Expand) | Author |
2020-01-26 | Fix fast version of training in traintessv5.sh | Nick White |
2020-01-22 | Try to capture git revision of ground truth used | Nick White |
2020-01-22 | Create fast version of training | Nick White |
2020-01-22 | Replace traintessv4 with traintessv5 script, which was used for fra-engbase t... | Nick White |
2019-10-23 | Add dir-to-pdfv3.sh, for use alongside bookpipeline | Nick White |
2019-10-23 | Add worst directory for fullocrdir script | Nick White |
2019-07-23 | fix hocrtotxtdir.sh | Nick White |
2019-07-15 | fix more eebotopdf bugs; hopefully more resiliant now | Nick White |
2019-07-15 | Add unpaperdir.sh | Nick White |
2019-07-15 | Ensure eebotopdf.sh uses a /tmp dir for tmp files | Nick White |
2019-07-15 | ensure eebo pdfs are saved to the appropriate directory | Nick White |
2019-07-15 | Make fullocrdir.sh only do things that haven't been done before | Nick White |
2019-06-25 | Add fixoverwiped.sh script | Nick White |
2019-06-11 | Fix bug in checkoverwiping script | Nick White |
2019-06-11 | Add checkoverwiping script | Nick White |
2019-06-11 | Add eebotopdf script | Nick White |
2019-06-10 | Do bookgraph as standard when doing fullocrdir | Nick White |
2019-06-05 | Rename bookgraphv2.sh to the canonical bookgraph | Nick White |
2019-06-05 | Ensure bookgraph uses directory name even when run with . for current dir, an... | Nick White |
2019-06-03 | Add dir-to-pdfv2 script | Nick White |
2019-06-03 | Fix dir-to-pdf output naming | Nick White |
2019-05-15 | Adjust fullocrdir.sh to latest version of pgconf | Nick White |
2019-05-14 | Add bookgraphv2, to go hand in hand with fullocrdir | Nick White |
2019-05-14 | fix typo | Nick White |
2019-05-14 | Add fullocrdir script, which does multiple binarisation options and picks the... | Nick White |
2019-05-08 | Ensure dir-to-pdf saves to dirname.pdf not dirname/.pdf, and handle all diffe... | Nick White |
2019-05-08 | Make scrape scripts executable | Nick White |
2019-05-08 | Make scrapers more robust, and have them scrape into a directory per book | Nick White |
2019-05-08 | Make BNF scraper much more robust | Nick White |
2019-05-08 | Allow an argument to set pdf savefile, and resize pdf images to be way smaller | Nick White |
2019-05-08 | Rename pdf prep tool as it creates the pdf too now | Nick White |
2019-05-08 | Use sane page numbering for erara scraper | Nick White |
2019-05-08 | Add scrape-erara.sh script (not fully tested) | Nick White |
2019-05-08 | Set DPI for images, and maximally compress jpg (with binarisation it doesn't ... | Nick White |
2019-05-08 | Add format-for-hocr-pdf.sh script | Nick White |
2019-04-23 | Save dehyphenated text to a different file, rather than overwriting the original | Nick White |
2019-04-23 | Add dehyphenate script | Nick White |
2019-04-09 | Modify traintessv4.sh to include step to construct final training | Nick White |
2019-04-02 | Fix bugs in traintessv4.sh | Nick White |
2019-04-02 | Add tesseractv4 training script | Nick White |
2019-03-26 | Make book graph scripts more robust to dodgy page filenames, and name bookgra... | Nick White |
2019-03-26 | Add nonewlines script | Nick White |
2019-03-11 | Add basic bsb scraper | Nick White |
2019-02-25 | Make bookgraph script more readable | Nick White |
2019-02-25 | Add various helper scripts | Nick White |