Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-02-20 | [pare-gt] gofmt | Nick White | |
2020-02-20 | [pare-gt] Fix sampling formula, make robust in the face of a 100% sample ↵ | Nick White | |
request, and fix up test output | |||
2020-02-20 | [pare-gt] Add some tests, and make deterministic | Nick White | |
These tests have uncovered at least 2 bugs that haven't yet been squashed: - 1% selection hangs - 20% selection only takes as many as 10% | |||
2020-02-20 | [pare-gt] gofmt | Nick White | |
2020-02-19 | Split sampling functionality in pare-gt into a separate function that can be ↵ | Nick White | |
tested (coming soon) | |||
2020-02-11 | Add pare-gt tool | Nick White | |
2020-01-22 | Fix up boxtotxt tool | Nick White | |
2020-01-22 | Add GetWordConfs function to hocr pkg | Nick White | |
2020-01-22 | Add simple boxtotxt tool | Nick White | |
2019-11-12 | Clean up, and add comment explaining design choice to fonttobytes | Nick White | |
2019-11-12 | Add fonttobytes, to embed the font into pdf tools in due course | Nick White | |
2019-10-31 | Export a couple of more generally useful functions | Nick White | |
2019-10-30 | Simplify and document hocr package slightly better | Nick White | |
2019-10-23 | Add tiny doc.go, hopefully will ensure 'go get rescribe.xyz/utils' doesn't ↵ | Nick White | |
return an error for lack of .go files | |||
2019-10-23 | Make bucket-lines and related packages more robust | Nick White | |
bucket-lines would crash for any line that didn't have a corresponding image. Lines which weren't grayscale would also cause crashes; now they are just converted to grayscale if necessary. As a bonus, lines in jpeg can also be decoded successfull. | |||
2019-10-08 | Remove parts that have been moved elsewhere, and rename to rescribe.xyz/utils | Nick White | |
bookpipeline is now at rescribe.xyz/bookpipeline preproc is now at rescribe.xyz/preproc integralimg is now at rescribe.xyz/preproc/integralimg | |||
2019-10-07 | Ensure wipe pipeline uses the expected png files | Nick White | |
2019-10-02 | Improve usage notice for booktopipeline | Nick White | |
2019-10-02 | Add -prebinarised flag to booktopipeline | Nick White | |
2019-10-02 | gofmt | Nick White | |
2019-10-02 | Add wipeonly queue and functionality | Nick White | |
This is useful for prebinarised images, which don't need full preprocessing, but do require wiping, albeit with a more conservative threshold. | |||
2019-09-27 | Improve wiping procedure to work better with 2 column layouts | Nick White | |
2019-09-27 | Fix crash bug when graph was used on source with less than 10 pages | Nick White | |
2019-09-27 | One more update of graph.go to correspond to new go-chart, and improve usage ↵ | Nick White | |
for wipe | |||
2019-09-27 | Hardcode to ignore "workhorse" from logs | Nick White | |
2019-09-27 | Update usage of go-chart to correspond to latest version of library | Nick White | |
2019-09-24 | gofmt | Nick White | |
2019-09-24 | Improve ssh logs; ensure only fully operational servers are tried, and ↵ | Nick White | |
ensure connections to new ips not in known_hosts still succeed | |||
2019-09-24 | Do ssh log collection concurrently | Nick White | |
2019-09-24 | Get ssh logs from all running servers | Nick White | |
2019-09-24 | Add list of books done and in progress to lspipeline | Nick White | |
2019-09-24 | Rewrite GetInstanceDetails so page function is separate | Nick White | |
2019-09-24 | Move ec2 stuff out of lspipeline and into aws.go | Nick White | |
2019-09-23 | gofmt | Nick White | |
2019-09-23 | Move the sqs stuff out to aws.go | Nick White | |
2019-09-19 | Add queue listing to lspipeline | Nick White | |
2019-09-19 | Switch to using a goroutine for ec2 instance info, so can do all aws ↵ | Nick White | |
requests concurrently in due course | |||
2019-09-18 | Add start of lspipeline | Nick White | |
2019-09-17 | gofmt | Nick White | |
2019-09-16 | Be more careful to try to grab the message after a heartbeat failure more ↵ | Nick White | |
quickly Rather than waiting for the whole length of a visibility timeout, in which time another process may grab the message, instead wait a short amount of time, each time the message is searched for. Also add a bit more logging. | |||
2019-09-14 | Ensure enough time has elapsed before looking for the message to reget in ↵ | Nick White | |
the case of heartbeat running out | |||
2019-09-12 | Don't prefix date/time to logs, as logger will store that anyway | Nick White | |
2019-09-11 | Increase size of graph to 4k | Nick White | |
2019-09-11 | Fix bug with graph that prevented the ticks from being correct, thus ruining ↵ | Nick White | |
the graph | |||
2019-09-11 | Work around the SQS limit of 12 hours of visibility timeout | Nick White | |
This is done by checking for the error that is emitted in such a case, and if it's found trying several times to find the message back in the queue, and returning the message with an updated handle back to the caller to use in the future. | |||
2019-09-06 | Add flags to disable checking various queues | Nick White | |
2019-09-05 | Handle no words found error in a better way so any page that is actually 0 ↵ | Nick White | |
confidence is recognised | |||
2019-09-05 | Don't abort analysis if we encounter a hocr with no words, just skip it | Nick White | |
2019-09-05 | gofmt | Nick White | |
2019-09-05 | Update Pipeliner interface in getpipelinebook, and update some comments | Nick White | |