Age | Commit message (Collapse) | Author |
|
|
|
author, year and title and naming the directory appropriately
|
|
|
|
This means that even in multi page hocrs with lines with the same
id (like line_1_1), then the page name will be different, so
extracthocrlines now won't mistakenly name different lines the same
and therefore overwrite them.
|
|
cleanly
|
|
|
|
This was also triggering for erara, causing it to fail. As it's clearly
a Bodleian special case, we now check the URL is Bodleian before
applying it.
|
|
|
|
At the time of writing, the https://manuscrits-france-angleterre.org
website has expired certificates, which make accessing their images
a pain. While the issue is obviously with them, it's reasonable for
us to add a -insecure flag (emphatically not the default) to override
cert checking for cases like this.
|
|
files before exit
|
|
|
|
|
|
|
|
|
|
definitive service can be found
|
|
harvardartmuseums iiif manifest example url
|
|
a -bookdir flag to set download directory
|
|
|
|
|
|
|
|
versions of go
|
|
|
|
|
|
|
|
|
|
|
|
|
|
quality images with no visible watermark
|
|
|
|
|
|
|
|
|
|
- Ensure a final hyphen on the last word of a page isn't removed
- Only try to add a next word if there is one to take
- Ensure that if a single word is on the following line, which is
taken, then the line is blanked
|
|
|
|
duplicate (no way of knowing as if it is its downsized)
|
|
|
|
|
|
|
|
an annoying circular dependency)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
request, and fix up test output
|
|
These tests have uncovered at least 2 bugs that haven't yet been squashed:
- 1% selection hangs
- 20% selection only takes as many as 10%
|
|
|