Age | Commit message (Collapse) | Author | |
---|---|---|---|
2021-08-09 | pdf: significantly improve character coordinates | Nick White | |
A few good changes to make word coordinate lookups significantly more accurate: - Set font size dynamically based on the line height (previously it was fixed as size 10) - Correct height and width of word boxes (previously they were way too large, which probably didn't make a difference in the general case, but now they're correct) - Set word box margin to zero Also change PDF size to A5 paper, as that's closer to an average book page size. | |||
2021-08-02 | rescribe: Add experimental m1 build | Nick White | |
2021-08-02 | internal/pipeline: Add test (incomplete but working) for UploadImages | Nick White | |
2021-07-27 | internal/pipeline: Add test to check that hidden files are skipped | Nick White | |
2021-07-27 | Update dependencies | Nick White | |
2021-07-27 | internal/pipeline: add tests for DetectQueueType | Nick White | |
2021-07-27 | internal/pipeline: Add notreadable test to CheckImages | Nick White | |
2021-07-27 | internal/pipeline: Add a test for CheckImages | Nick White | |
2021-07-20 | Cleanup thanks to go vet | Nick White | |
2021-07-19 | internal/pipeline: Be more explicit with exactly what functions are in each ↵ | Nick White | |
interface, to ensure no "duplicate function" errors when compiling | |||
2021-07-13 | Fix up tests a bit | Nick White | |
2021-07-13 | gofmt | Nick White | |
2021-07-13 | internal/pipeline: Reorganise interfaces so that functions only declare what ↵ | Nick White | |
they need We were using Pipeliner as a catch-all, but it's nicer if the functions can just state that e.g. they need download functionality, so decompose things so that that's how we do things now. | |||
2021-07-13 | aws: Only look up test queue id when asked for, as for most Init()s it won't ↵ | Nick White | |
be needed | |||
2021-07-12 | Add necessary pipeliner dependency for testqueue (probably remove this from ↵ | Nick White | |
internal library later as its only needed for tests | |||
2021-07-12 | Add test for upAndQueue function | Nick White | |
This involved adding a test queue, so it can be run safely without intefering with the pipeline. | |||
2021-07-08 | rescribe: Exit with an error if directory doesn't exist | Nick White | |
2021-06-29 | rescribe: add documentation on how to generate embedded data | Nick White | |
2021-06-29 | rescribe: Add embed target for darwin (osx) too | Nick White | |
2021-06-22 | rescribe: Remove erroneous unnecessary mkdir | Nick White | |
2021-06-22 | rescribe: Make it clearer that embedded training files are available to use | Nick White | |
2021-06-22 | rescribe: add embedded tesseract for linux | Nick White | |
2021-06-22 | rescribe: allow use of embedded training even if -systess is used | Nick White | |
2021-06-22 | cloud: update spot image to latest version that wont attempt to build ↵ | Nick White | |
rescribe tool | |||
2021-06-22 | rescribe: Add go generate command to download the needed files to embed | Nick White | |
2021-06-22 | rescribe: Add an embedded tessdata | Nick White | |
2021-06-21 | Merge remote-tracking branch 'ssh/master' | Nick White | |
2021-06-21 | rescribe: Set up so only Tesseract needed for the build platform is embedded | Nick White | |
2021-06-21 | rescribe: Embed Tesseract into binary so that no Tesseract install is necessary | Nick White | |
2021-06-21 | update spot image used | Nick White | |
2021-06-15 | pipeline: Ignore hidden files when checking and uploading | Nick White | |
This prevents issues if a .DS_Store file is present in a directory. | |||
2021-05-31 | local: Only create a file once we are sure that it will be writeable | Nick White | |
2021-05-31 | Add a test for up(), and document download() and up() properly | Nick White | |
2021-05-31 | Fix bug after changing pipeliner for tests, to ensure DeleteObjects is ↵ | Nick White | |
available to Pipeliner | |||
2021-05-19 | Close process channel after writing to err channel in download(), in case of ↵ | Nick White | |
an error This is needed so that in tests the error can be selected out reliably, rather than an empty process signal. | |||
2021-05-19 | Add tests for download() | Nick White | |
2021-05-19 | Fix syntax with another Errorf call | Nick White | |
2021-05-19 | Local download now tries to open the source file before creating a ↵ | Nick White | |
destination file, so if it fails an empty file isnt left behind | |||
2021-05-19 | Add basic DeleteObjects implementation to local.go | Nick White | |
2021-05-19 | Fix syntax for some fmt.Errorf calls | Nick White | |
2021-04-12 | Update preproc dependency | Nick White | |
2021-03-16 | rescribe: change default training directory to trainings/v0.3.3 | Nick White | |
2021-02-22 | lspipeline: Rename to lspipeline-ng, and restore pre concurrency version to ↵ | Nick White | |
lspipeline as there are some hard to debug issues in concurrency version | |||
2021-02-15 | getsamplepages: Add -prefix option, and use 'best' to get random page numbers | Nick White | |
The -prefix option is useful to us. Previously only a .jpg for page number 100 was retreived, which failed if the book had fewer (or unusually named) pages, and also didn't provide a corresponding .hocr at all (bug introduced with 48958d2). Using 'best', which is (effectively) randomly sorted, provides a guaranteed to exist page, and a random one at that. | |||
2021-02-05 | Merge branch 'master' of ↵ | Nick White | |
ssh://ssh.phx.nearlyfreespeech.net/home/public/bookpipeline | |||
2021-02-05 | Update go-chart dependency | Nick White | |
2021-02-01 | Update AWS dependency to 1.37.1 | Nick White | |
2021-02-01 | Ensure DeleteObjects can handle over 1000 files to delete; fixes rmbook for ↵ | Nick White | |
large books | |||
2021-01-26 | Make ListObjectsWithMeta generic again and create a specialised ↵ | Nick White | |
ListObjectWithMeta for single file listing, so we can still be as fast, but do not have a misleading api | |||
2021-01-26 | Improve lspipeline concurrency by removing WaitGroup stuff | Nick White | |