Age | Commit message (Collapse) | Author | |
---|---|---|---|
2024-03-04 | Ensure an invalid jpeg is closed before removal is attempted, to fix crash ↵ | Nick White | |
on Windows | |||
2024-02-06 | Add concatenated text output to rescribe output dir, called bookname.txt | Nick White | |
2024-02-06 | Fix selecting a custom training file in flatpak | Nick White | |
This is done by copying any training to the temporary tesseract directory, and always using that as the TESSDIR. This works as it's writeable (unlike the /app/share directory that flatpak would otherwise work) | |||
2022-11-17 | rescribe: gofmt | Nick White | |
2022-11-17 | rescribe: support CCITTFaxDecode (tiff) encoded images in PDF reading | Nick White | |
2022-10-27 | Pass log around as pointer to fix go vet pointing out that this meant ↵ | Nick White | |
copying a sync.Mutex | |||
2022-10-27 | gofmt | Nick White | |
2022-10-27 | Allow completely non-embedded builds | Nick White | |
This enables installs straight from 'go install' or 'fyne install'. It also means warning if a system getgbook isn't found, and erroring if tesseract isn't found (as was done already). The location of getgbook can therefore now be specified on the command line. Embedded builds are enabled with the -tags embed flag, which the makefile sets for all builds. | |||
2022-03-22 | rescribe: update to rescribev9 as default training to usev1.0.0 | Nick White | |
2022-03-21 | rescribe: Improve error messages if no pages are found | Nick White | |
2022-03-21 | rescribe: Update copyright years and add TODO file | Nick White | |
2022-03-21 | rescribe: Update traineddata descriptions in command line version | Nick White | |
2022-03-21 | Update tessdata to only include a few trainings | Nick White | |
2022-03-21 | rescribe: Improve cli wording and simplify PDF stuff slightly | Nick White | |
2022-03-21 | Only generate full-size PDF if requested | Nick White | |
This avoids the issue that large PDFs require a lot of RAM, so there are chances of running out of memory. Plus it's a waste of space and time. | |||
2022-03-11 | Add initial support for full-size PDF generation | Nick White | |
Some issues: 1) The PDF generation stores every page in memory while it constructs it. That means that there's a higher chance of failure due to running out of memory with these. There's no getting around this except by improving the PDF generation library, which is not easy. 2) Currently I've just changed the pipeline to always generate these full size PDFs, and then the rescribe tool will just delete them if they weren't requested. This is bad in particular because of point 1, and would probably cause issues of failures in the server pipeline as a result Therefore the plan is to add a tag to queue messages so that full size generation can be selectively enabled. Also, it should be split from the loop with colour pdf generation, as holding them both in RAM at the same time is unnecessary. | |||
2022-03-11 | Name PDF extracted images so they sort correctly | Nick White | |
2022-02-28 | rescribe: Add " searchable" to file name for saved PDF | Nick White | |
2022-02-28 | Add PreNoWipe queue, that just does binarisation but no wiping | Nick White | |
2022-02-23 | rescribe: fix typo with embedded getgbook running | Nick White | |
2022-02-23 | rescribe: Add embedded support for getgbook, for linux only so far | Nick White | |
2022-02-21 | Ensure that no new console windows are opened on Windows when executing ↵ | Nick White | |
Tesseract | |||
2022-01-31 | rescribe: Add context cancelling to extractPdfImgs(), so it's no longer ↵ | Nick White | |
possible to get the gui into a bad state by cancelling before startProcess began (hopefully) | |||
2022-01-31 | rescribe: Ensure status isnt overwritten after an abort, when wipe-only ↵ | Nick White | |
preprocessing | |||
2022-01-31 | Make pipeline context-aware, so the rescribe tool can cancel jobs | Nick White | |
2022-01-10 | rescribe: Rename PDFs taking into account that in some cases one or the ↵ | Nick White | |
other of binarised or colour may not exist | |||
2022-01-10 | rescribe: handle PDF errors much more gracefully | Nick White | |
2021-12-20 | rescribe: Ensure temporary tesseract data is only removed when the program ↵ | Nick White | |
ends, so multiple books can be processed by the gui one after the other | |||
2021-12-20 | rescribe: Ensure temporary tesseract dir is removed in gui mode too | Nick White | |
2021-12-20 | whitespace and error clarity changes | Nick White | |
2021-12-20 | fixed -png flag and changed rescribe tool to save binarized png in separate ↵ | Antonia Rescribe | |
folder | |||
2021-12-06 | pipeline: process jpg or png regardless of whether in wipe or preprocess queue | Nick White | |
2021-11-23 | rescribe: Remove debugging printfs related to PDF parsing | Nick White | |
2021-11-23 | rescribe: Improve pdf consumption by ensuring only jpg or png are saved to ↵ | Nick White | |
upload | |||
2021-11-22 | rescribe: Add support for reading images directly from PDFs | Nick White | |
There are several TODO items before this can be considered "good enough", let alone complete. See the comments in the code for details. On a good day, with a fair wind, though, this works. | |||
2021-11-22 | rescribe: replace errors.New with fmt.Errorf | Nick White | |
2021-11-02 | rescribe: handle directories with spaces correctly | Nick White | |
2021-10-26 | rescribe: Separate gui code, and organise it better (should be no functional ↵ | Nick White | |
change) | |||
2021-10-25 | rescribe: wip gui using fyne | Nick White | |
2021-10-12 | rescribe: fix lookup of external training filev0.5.3 | Nick White | |
2021-10-01 | rescribe: Add embedded lat.traineddata | Nick White | |
2021-10-01 | rescribe: Add both original training path and embedded version on error ↵ | Nick White | |
output for training file not found, so that its clear that the file specified may not exist | |||
2021-08-17 | pipeline: use regular storage for tests, rather than a separate one | Nick White | |
2021-08-02 | internal/pipeline: Add test (incomplete but working) for UploadImages | Nick White | |
2021-07-20 | Cleanup thanks to go vet | Nick White | |
2021-07-13 | gofmt | Nick White | |
2021-07-08 | rescribe: Exit with an error if directory doesn't exist | Nick White | |
2021-06-29 | rescribe: Add embed target for darwin (osx) too | Nick White | |
2021-06-22 | rescribe: Remove erroneous unnecessary mkdir | Nick White | |
2021-06-22 | rescribe: Make it clearer that embedded training files are available to use | Nick White | |