Age | Commit message (Collapse) | Author | |
---|---|---|---|
2022-01-10 | rescribe: ensure books with a space in the name are handled correctly in the gui | Nick White | |
2022-01-10 | rescribe: Rename PDFs taking into account that in some cases one or the ↵ | Nick White | |
other of binarised or colour may not exist | |||
2022-01-10 | internal/pipeline: Have DownloadPdfs() try to download all PDFs, but only ↵ | Nick White | |
return an error if none downloaded, as there are times when the colour PDF will not exist, which is fine | |||
2022-01-10 | rescribe: handle PDF errors much more gracefully | Nick White | |
2022-01-04 | rescribe: parse stdout and set progress bar based on it, using appropriate ↵ | Nick White | |
labels for the progress bar text to show what's being done | |||
2022-01-04 | rescribe: Restrict file types to select for .pdf and .traineddata file pickers | Nick White | |
2022-01-04 | rescribe: add select box to choose training to use, including an Other... option | Nick White | |
2021-12-20 | rescribe: Ensure temporary tesseract data is only removed when the program ↵ | Nick White | |
ends, so multiple books can be processed by the gui one after the other | |||
2021-12-20 | rescribe: Improve layout of gui, and make dir entry box read only | Nick White | |
2021-12-20 | rescribe: Ensure temporary tesseract dir is removed in gui mode too | Nick White | |
2021-12-20 | rescribe: add "Choose PDF" button, and make chosen dir/file section a label ↵ | Nick White | |
rather than an entry | |||
2021-12-20 | whitespace and error clarity changes | Nick White | |
2021-12-20 | fixed -png flag and changed rescribe tool to save binarized png in separate ↵ | Antonia Rescribe | |
folder | |||
2021-12-20 | rescribe: Include stderr in log area, and ensure button is re-enabled on failure | Nick White | |
2021-12-06 | Update cloud settings to bookpipeline-v18.3 | Nick White | |
2021-12-06 | pipeline: process jpg or png regardless of whether in wipe or preprocess queue | Nick White | |
2021-12-06 | graph: make number page parsing much more robust, and ensure fake numbers ↵ | Nick White | |
are used to create a coherant graph if any page numbers cannot be found from file names | |||
2021-12-06 | pipeline: ignore any files with a non-image suffix, rather than erroring on them | Nick White | |
2021-11-23 | rescribe: Remove debugging printfs related to PDF parsing | Nick White | |
2021-11-23 | rescribe: Improve pdf consumption by ensuring only jpg or png are saved to ↵ | Nick White | |
upload | |||
2021-11-23 | gofmt, plus update documentation of recently changed pipeline.UploadImages | Nick White | |
2021-11-22 | internal/pipeline: remove old and broken requirement for TestStorageId() | Nick White | |
2021-11-22 | changed put.go so that a 4-digit number is appended to the end of each ↵ | Antonia Rescribe | |
filename when images are uploaded to the pipeline | |||
2021-11-22 | rescribe: Add support for reading images directly from PDFs | Nick White | |
There are several TODO items before this can be considered "good enough", let alone complete. See the comments in the code for details. On a good day, with a fair wind, though, this works. | |||
2021-11-22 | rescribe: replace errors.New with fmt.Errorf | Nick White | |
2021-11-20 | update spot image again | Nick White | |
2021-11-20 | Update spot image to v18.0 | Nick White | |
2021-11-20 | Enable fyne gui again | Nick White | |
2021-11-09 | lspipeline-ng: Remove debugging printf | Nick White | |
2021-11-02 | rescribe: handle directories with spaces correctly | Nick White | |
2021-10-29 | Temporarily disable fyne module, as it causes issues with go1.11 build | Nick White | |
2021-10-26 | rescribe: Separate gui code, and organise it better (should be no functional ↵ | Nick White | |
change) | |||
2021-10-25 | rescribe: wip gui using fyne | Nick White | |
2021-10-12 | rescribe: fix lookup of external training filev0.5.3 | Nick White | |
2021-10-01 | rescribe: Include new tessdata in embed getterv0.5.2 | Nick White | |
2021-10-01 | rescribe: Add embedded lat.traineddata | Nick White | |
2021-10-01 | rescribe: Add both original training path and embedded version on error ↵ | Nick White | |
output for training file not found, so that its clear that the file specified may not exist | |||
2021-08-30 | pdf: Always encode images as jpegv0.5.1 | Nick White | |
Previously for PDFs using binarised images we kept them as PNG, but there's no good reason to do so, it's better to just get the space savings on offer from jpeg. | |||
2021-08-30 | adjusted the height of the image in the pdf to 1000px if the smaller option ↵ | Antonia Rescribe | |
is chosen | |||
2021-08-24 | rescribe: improve makefile to match the way we deploy to the website | Nick White | |
2021-08-19 | lspipeline-ng: Limit number of book details requests so we don't run into ↵v0.5.0 | Nick White | |
EC2's rate limiting | |||
2021-08-18 | rescribe: Update documentation on how to deal with M1 signing, and move ↵ | Nick White | |
makefile to where it makes sense | |||
2021-08-17 | pdf: Stretch words to fit in their boxes, for more perfect embedding | Nick White | |
- Words are stretched to fit their boxes, which means the accuracy is now very high indeed. This was done by modifying gofpdf to add the SetCellStretchToFit function, which will hopefully be upstreamed in due course. - Copy pasting from a PDF works well with lines rarely if ever being erroneously broken by the PDF reader. There was quite a bit of trial-and-error to improve this, and the stretched text plus a space being added after the word in CellFormat was the best (plus preserves accuracy of word and character locations). | |||
2021-08-17 | pipeline: use regular storage for tests, rather than a separate one | Nick White | |
2021-08-09 | pdf: use same line height and origin for all words on a line as it makes ↵ | Nick White | |
things neater in the PDF in most cases | |||
2021-08-09 | pdf: significantly improve character coordinates | Nick White | |
A few good changes to make word coordinate lookups significantly more accurate: - Set font size dynamically based on the line height (previously it was fixed as size 10) - Correct height and width of word boxes (previously they were way too large, which probably didn't make a difference in the general case, but now they're correct) - Set word box margin to zero Also change PDF size to A5 paper, as that's closer to an average book page size. | |||
2021-08-02 | rescribe: Add experimental m1 build | Nick White | |
2021-08-02 | internal/pipeline: Add test (incomplete but working) for UploadImages | Nick White | |
2021-07-27 | internal/pipeline: Add test to check that hidden files are skipped | Nick White | |
2021-07-27 | Update dependencies | Nick White | |