Age | Commit message (Expand) | Author |
2022-10-14 | update traintess python only | Nick White |
2022-10-14 | Revert "force python3 in traintess" | Nick White |
2022-10-14 | force python3 in traintess | Nick White |
2022-01-31 | update traintess and generate_line_box | Nick White |
2020-05-26 | Add overly bare leiden scraper | Nick White |
2020-04-20 | Add generate_line_box.py to this repo and reference it nicely in scripts | Nick White |
2020-02-13 | Fix up testtraining more | Nick White |
2020-02-13 | Fix up testtraining | Nick White |
2020-02-13 | Add testtraining script | Nick White |
2020-02-11 | Switch to 100000 iterations for training | Nick White |
2020-02-11 | Rename traintessv5.sh to traintess.sh | Nick White |
2020-01-26 | Fix fast version of training in traintessv5.sh | Nick White |
2020-01-22 | Try to capture git revision of ground truth used | Nick White |
2020-01-22 | Create fast version of training | Nick White |
2020-01-22 | Replace traintessv4 with traintessv5 script, which was used for fra-engbase t... | Nick White |
2019-10-23 | Add dir-to-pdfv3.sh, for use alongside bookpipeline | Nick White |
2019-10-23 | Add worst directory for fullocrdir script | Nick White |
2019-07-23 | fix hocrtotxtdir.sh | Nick White |
2019-07-15 | fix more eebotopdf bugs; hopefully more resiliant now | Nick White |
2019-07-15 | Add unpaperdir.sh | Nick White |
2019-07-15 | Ensure eebotopdf.sh uses a /tmp dir for tmp files | Nick White |
2019-07-15 | ensure eebo pdfs are saved to the appropriate directory | Nick White |
2019-07-15 | Make fullocrdir.sh only do things that haven't been done before | Nick White |
2019-06-25 | Add fixoverwiped.sh script | Nick White |
2019-06-11 | Fix bug in checkoverwiping script | Nick White |
2019-06-11 | Add checkoverwiping script | Nick White |
2019-06-11 | Add eebotopdf script | Nick White |
2019-06-10 | Do bookgraph as standard when doing fullocrdir | Nick White |
2019-06-05 | Rename bookgraphv2.sh to the canonical bookgraph | Nick White |
2019-06-05 | Ensure bookgraph uses directory name even when run with . for current dir, an... | Nick White |
2019-06-03 | Add dir-to-pdfv2 script | Nick White |
2019-06-03 | Fix dir-to-pdf output naming | Nick White |
2019-05-15 | Adjust fullocrdir.sh to latest version of pgconf | Nick White |
2019-05-14 | Add bookgraphv2, to go hand in hand with fullocrdir | Nick White |
2019-05-14 | fix typo | Nick White |
2019-05-14 | Add fullocrdir script, which does multiple binarisation options and picks the... | Nick White |
2019-05-08 | Ensure dir-to-pdf saves to dirname.pdf not dirname/.pdf, and handle all diffe... | Nick White |
2019-05-08 | Make scrape scripts executable | Nick White |
2019-05-08 | Make scrapers more robust, and have them scrape into a directory per book | Nick White |
2019-05-08 | Make BNF scraper much more robust | Nick White |
2019-05-08 | Allow an argument to set pdf savefile, and resize pdf images to be way smaller | Nick White |
2019-05-08 | Rename pdf prep tool as it creates the pdf too now | Nick White |
2019-05-08 | Use sane page numbering for erara scraper | Nick White |
2019-05-08 | Add scrape-erara.sh script (not fully tested) | Nick White |
2019-05-08 | Set DPI for images, and maximally compress jpg (with binarisation it doesn't ... | Nick White |
2019-05-08 | Add format-for-hocr-pdf.sh script | Nick White |
2019-04-23 | Save dehyphenated text to a different file, rather than overwriting the original | Nick White |
2019-04-23 | Add dehyphenate script | Nick White |
2019-04-09 | Modify traintessv4.sh to include step to construct final training | Nick White |
2019-04-02 | Fix bugs in traintessv4.sh | Nick White |