sh - Various shell scripts, mostly superceded by Go tools

Age	Commit message (Collapse)	Author
2020-02-11	Rename traintessv5.sh to traintess.sh	Nick White

2020-01-26	Fix fast version of training in traintessv5.sh	Nick White
	Tesseract relies on the position of arguments for lstmtraining, surprisingly. Anyway, this formulation works correctly.
2020-01-22	Try to capture git revision of ground truth used	Nick White

2020-01-22	Create fast version of training	Nick White

2020-01-22	Replace traintessv4 with traintessv5 script, which was used for fra-engbase ↵	Nick White
	training (very minor edits)
2019-10-23	Add dir-to-pdfv3.sh, for use alongside bookpipeline	Nick White

2019-10-23	Add worst directory for fullocrdir script	Nick White

2019-07-23	fix hocrtotxtdir.sh	Nick White

2019-07-15	fix more eebotopdf bugs; hopefully more resiliant now	Nick White

2019-07-15	Add unpaperdir.sh	Nick White

2019-07-15	Ensure eebotopdf.sh uses a /tmp dir for tmp files	Nick White

2019-07-15	ensure eebo pdfs are saved to the appropriate directory	Nick White

2019-07-15	Make fullocrdir.sh only do things that haven't been done before	Nick White

2019-06-25	Add fixoverwiped.sh script	Nick White

2019-06-11	Fix bug in checkoverwiping script	Nick White

2019-06-11	Add checkoverwiping script	Nick White

2019-06-11	Add eebotopdf script	Nick White

2019-06-10	Do bookgraph as standard when doing fullocrdir	Nick White

2019-06-05	Rename bookgraphv2.sh to the canonical bookgraph	Nick White
	Add word count to the graph. Use a scaled figure so it's easy to compare with the confidence.
2019-06-05	Ensure bookgraph uses directory name even when run with . for current dir, ↵	Nick White
	and ensure temp dirs are destroyed
2019-06-03	Add dir-to-pdfv2 script	Nick White

2019-06-03	Fix dir-to-pdf output naming	Nick White

2019-05-15	Adjust fullocrdir.sh to latest version of pgconf	Nick White

2019-05-14	Add bookgraphv2, to go hand in hand with fullocrdir	Nick White

2019-05-14	fix typo	Nick White

2019-05-14	Add fullocrdir script, which does multiple binarisation options and picks ↵	Nick White
	the ones with the highest confidence
2019-05-08	Ensure dir-to-pdf saves to dirname.pdf not dirname/.pdf, and handle all ↵	Nick White
	different naming conventions
2019-05-08	Make scrape scripts executable	Nick White

2019-05-08	Make scrapers more robust, and have them scrape into a directory per book	Nick White

2019-05-08	Make BNF scraper much more robust	Nick White

2019-05-08	Allow an argument to set pdf savefile, and resize pdf images to be way smaller	Nick White

2019-05-08	Rename pdf prep tool as it creates the pdf too now	Nick White

2019-05-08	Use sane page numbering for erara scraper	Nick White

2019-05-08	Add scrape-erara.sh script (not fully tested)	Nick White

2019-05-08	Set DPI for images, and maximally compress jpg (with binarisation it doesn't ↵	Nick White
	make much difference)
2019-05-08	Add format-for-hocr-pdf.sh script	Nick White

2019-04-23	Save dehyphenated text to a different file, rather than overwriting the original	Nick White

2019-04-23	Add dehyphenate script	Nick White

2019-04-09	Modify traintessv4.sh to include step to construct final training	Nick White

2019-04-02	Fix bugs in traintessv4.sh	Nick White

2019-04-02	Add tesseractv4 training script	Nick White

2019-03-26	Make book graph scripts more robust to dodgy page filenames, and name ↵	Nick White
	bookgraph better
2019-03-26	Add nonewlines script	Nick White

2019-03-11	Add basic bsb scraper	Nick White

2019-02-25	Make bookgraph script more readable	Nick White

2019-02-25	Add various helper scripts	Nick White