summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-04rescribe: Restrict file types to select for .pdf and .traineddata file pickersNick White
2022-01-04rescribe: add select box to choose training to use, including an Other... optionNick White
2021-12-20rescribe: Ensure temporary tesseract data is only removed when the program ↵Nick White
ends, so multiple books can be processed by the gui one after the other
2021-12-20rescribe: Improve layout of gui, and make dir entry box read onlyNick White
2021-12-20rescribe: Ensure temporary tesseract dir is removed in gui mode tooNick White
2021-12-20rescribe: add "Choose PDF" button, and make chosen dir/file section a label ↵Nick White
rather than an entry
2021-12-20whitespace and error clarity changesNick White
2021-12-20fixed -png flag and changed rescribe tool to save binarized png in separate ↵Antonia Rescribe
folder
2021-12-20rescribe: Include stderr in log area, and ensure button is re-enabled on failureNick White
2021-12-06Update cloud settings to bookpipeline-v18.3Nick White
2021-12-06pipeline: process jpg or png regardless of whether in wipe or preprocess queueNick White
2021-12-06graph: make number page parsing much more robust, and ensure fake numbers ↵Nick White
are used to create a coherant graph if any page numbers cannot be found from file names
2021-12-06pipeline: ignore any files with a non-image suffix, rather than erroring on themNick White
2021-11-23rescribe: Remove debugging printfs related to PDF parsingNick White
2021-11-23rescribe: Improve pdf consumption by ensuring only jpg or png are saved to ↵Nick White
upload
2021-11-23gofmt, plus update documentation of recently changed pipeline.UploadImagesNick White
2021-11-22internal/pipeline: remove old and broken requirement for TestStorageId()Nick White
2021-11-22changed put.go so that a 4-digit number is appended to the end of each ↵Antonia Rescribe
filename when images are uploaded to the pipeline
2021-11-22rescribe: Add support for reading images directly from PDFsNick White
There are several TODO items before this can be considered "good enough", let alone complete. See the comments in the code for details. On a good day, with a fair wind, though, this works.
2021-11-22rescribe: replace errors.New with fmt.ErrorfNick White
2021-11-20update spot image againNick White
2021-11-20Update spot image to v18.0Nick White
2021-11-20Enable fyne gui againNick White
2021-11-09lspipeline-ng: Remove debugging printfNick White
2021-11-02rescribe: handle directories with spaces correctlyNick White
2021-10-29Temporarily disable fyne module, as it causes issues with go1.11 buildNick White
2021-10-26rescribe: Separate gui code, and organise it better (should be no functional ↵Nick White
change)
2021-10-25rescribe: wip gui using fyneNick White
2021-10-12rescribe: fix lookup of external training filev0.5.3Nick White
2021-10-01rescribe: Include new tessdata in embed getterv0.5.2Nick White
2021-10-01rescribe: Add embedded lat.traineddataNick White
2021-10-01rescribe: Add both original training path and embedded version on error ↵Nick White
output for training file not found, so that its clear that the file specified may not exist
2021-08-30pdf: Always encode images as jpegv0.5.1Nick White
Previously for PDFs using binarised images we kept them as PNG, but there's no good reason to do so, it's better to just get the space savings on offer from jpeg.
2021-08-30adjusted the height of the image in the pdf to 1000px if the smaller option ↵Antonia Rescribe
is chosen
2021-08-24rescribe: improve makefile to match the way we deploy to the websiteNick White
2021-08-19lspipeline-ng: Limit number of book details requests so we don't run into ↵v0.5.0Nick White
EC2's rate limiting
2021-08-18rescribe: Update documentation on how to deal with M1 signing, and move ↵Nick White
makefile to where it makes sense
2021-08-17pdf: Stretch words to fit in their boxes, for more perfect embeddingNick White
- Words are stretched to fit their boxes, which means the accuracy is now very high indeed. This was done by modifying gofpdf to add the SetCellStretchToFit function, which will hopefully be upstreamed in due course. - Copy pasting from a PDF works well with lines rarely if ever being erroneously broken by the PDF reader. There was quite a bit of trial-and-error to improve this, and the stretched text plus a space being added after the word in CellFormat was the best (plus preserves accuracy of word and character locations).
2021-08-17pipeline: use regular storage for tests, rather than a separate oneNick White
2021-08-09pdf: use same line height and origin for all words on a line as it makes ↵Nick White
things neater in the PDF in most cases
2021-08-09pdf: significantly improve character coordinatesNick White
A few good changes to make word coordinate lookups significantly more accurate: - Set font size dynamically based on the line height (previously it was fixed as size 10) - Correct height and width of word boxes (previously they were way too large, which probably didn't make a difference in the general case, but now they're correct) - Set word box margin to zero Also change PDF size to A5 paper, as that's closer to an average book page size.
2021-08-02rescribe: Add experimental m1 buildNick White
2021-08-02internal/pipeline: Add test (incomplete but working) for UploadImagesNick White
2021-07-27internal/pipeline: Add test to check that hidden files are skippedNick White
2021-07-27Update dependenciesNick White
2021-07-27internal/pipeline: add tests for DetectQueueTypeNick White
2021-07-27internal/pipeline: Add notreadable test to CheckImagesNick White
2021-07-27internal/pipeline: Add a test for CheckImagesNick White
2021-07-20Cleanup thanks to go vetNick White
2021-07-19internal/pipeline: Be more explicit with exactly what functions are in each ↵Nick White
interface, to ensure no "duplicate function" errors when compiling