summaryrefslogtreecommitdiff
path: root/cmd/rescribe/main.go
AgeCommit message (Collapse)Author
2024-04-02Add support for rotated images in PDFsrotationNick White
2024-03-04Ensure an invalid jpeg is closed before removal is attempted, to fix crash ↵Nick White
on Windows
2024-02-06Add concatenated text output to rescribe output dir, called bookname.txtNick White
2024-02-06Fix selecting a custom training file in flatpakNick White
This is done by copying any training to the temporary tesseract directory, and always using that as the TESSDIR. This works as it's writeable (unlike the /app/share directory that flatpak would otherwise work)
2022-11-17rescribe: gofmtNick White
2022-11-17rescribe: support CCITTFaxDecode (tiff) encoded images in PDF readingNick White
2022-10-27Pass log around as pointer to fix go vet pointing out that this meant ↵Nick White
copying a sync.Mutex
2022-10-27gofmtNick White
2022-10-27Allow completely non-embedded buildsNick White
This enables installs straight from 'go install' or 'fyne install'. It also means warning if a system getgbook isn't found, and erroring if tesseract isn't found (as was done already). The location of getgbook can therefore now be specified on the command line. Embedded builds are enabled with the -tags embed flag, which the makefile sets for all builds.
2022-03-22rescribe: update to rescribev9 as default training to usev1.0.0Nick White
2022-03-21rescribe: Improve error messages if no pages are foundNick White
2022-03-21rescribe: Update copyright years and add TODO fileNick White
2022-03-21rescribe: Update traineddata descriptions in command line versionNick White
2022-03-21Update tessdata to only include a few trainingsNick White
2022-03-21rescribe: Improve cli wording and simplify PDF stuff slightlyNick White
2022-03-21Only generate full-size PDF if requestedNick White
This avoids the issue that large PDFs require a lot of RAM, so there are chances of running out of memory. Plus it's a waste of space and time.
2022-03-11Add initial support for full-size PDF generationNick White
Some issues: 1) The PDF generation stores every page in memory while it constructs it. That means that there's a higher chance of failure due to running out of memory with these. There's no getting around this except by improving the PDF generation library, which is not easy. 2) Currently I've just changed the pipeline to always generate these full size PDFs, and then the rescribe tool will just delete them if they weren't requested. This is bad in particular because of point 1, and would probably cause issues of failures in the server pipeline as a result Therefore the plan is to add a tag to queue messages so that full size generation can be selectively enabled. Also, it should be split from the loop with colour pdf generation, as holding them both in RAM at the same time is unnecessary.
2022-03-11Name PDF extracted images so they sort correctlyNick White
2022-02-28rescribe: Add " searchable" to file name for saved PDFNick White
2022-02-28Add PreNoWipe queue, that just does binarisation but no wipingNick White
2022-02-23rescribe: fix typo with embedded getgbook runningNick White
2022-02-23rescribe: Add embedded support for getgbook, for linux only so farNick White
2022-02-21Ensure that no new console windows are opened on Windows when executing ↵Nick White
Tesseract
2022-01-31rescribe: Add context cancelling to extractPdfImgs(), so it's no longer ↵Nick White
possible to get the gui into a bad state by cancelling before startProcess began (hopefully)
2022-01-31rescribe: Ensure status isnt overwritten after an abort, when wipe-only ↵Nick White
preprocessing
2022-01-31Make pipeline context-aware, so the rescribe tool can cancel jobsNick White
2022-01-10rescribe: Rename PDFs taking into account that in some cases one or the ↵Nick White
other of binarised or colour may not exist
2022-01-10rescribe: handle PDF errors much more gracefullyNick White
2021-12-20rescribe: Ensure temporary tesseract data is only removed when the program ↵Nick White
ends, so multiple books can be processed by the gui one after the other
2021-12-20rescribe: Ensure temporary tesseract dir is removed in gui mode tooNick White
2021-12-20whitespace and error clarity changesNick White
2021-12-20fixed -png flag and changed rescribe tool to save binarized png in separate ↵Antonia Rescribe
folder
2021-12-06pipeline: process jpg or png regardless of whether in wipe or preprocess queueNick White
2021-11-23rescribe: Remove debugging printfs related to PDF parsingNick White
2021-11-23rescribe: Improve pdf consumption by ensuring only jpg or png are saved to ↵Nick White
upload
2021-11-22rescribe: Add support for reading images directly from PDFsNick White
There are several TODO items before this can be considered "good enough", let alone complete. See the comments in the code for details. On a good day, with a fair wind, though, this works.
2021-11-22rescribe: replace errors.New with fmt.ErrorfNick White
2021-11-02rescribe: handle directories with spaces correctlyNick White
2021-10-26rescribe: Separate gui code, and organise it better (should be no functional ↵Nick White
change)
2021-10-25rescribe: wip gui using fyneNick White
2021-10-12rescribe: fix lookup of external training filev0.5.3Nick White
2021-10-01rescribe: Add embedded lat.traineddataNick White
2021-10-01rescribe: Add both original training path and embedded version on error ↵Nick White
output for training file not found, so that its clear that the file specified may not exist
2021-08-17pipeline: use regular storage for tests, rather than a separate oneNick White
2021-08-02internal/pipeline: Add test (incomplete but working) for UploadImagesNick White
2021-07-20Cleanup thanks to go vetNick White
2021-07-13gofmtNick White
2021-07-08rescribe: Exit with an error if directory doesn't existNick White
2021-06-29rescribe: Add embed target for darwin (osx) tooNick White
2021-06-22rescribe: Remove erroneous unnecessary mkdirNick White