Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
Example book: https://content.staatsbibliothek-berlin.de/dc/687222079/manifest
|
|
|
|
|
|
lengths of title and author to ensure it never meets ext4 size limits
|
|
many open files errors
|
|
|
|
|
|
|
|
and use it in extracthocrlines
|
|
|
|
author, year and title and naming the directory appropriately
|
|
|
|
This means that even in multi page hocrs with lines with the same
id (like line_1_1), then the page name will be different, so
extracthocrlines now won't mistakenly name different lines the same
and therefore overwrite them.
|
|
cleanly
|
|
|
|
This was also triggering for erara, causing it to fail. As it's clearly
a Bodleian special case, we now check the URL is Bodleian before
applying it.
|
|
|
|
At the time of writing, the https://manuscrits-france-angleterre.org
website has expired certificates, which make accessing their images
a pain. While the issue is obviously with them, it's reasonable for
us to add a -insecure flag (emphatically not the default) to override
cert checking for cases like this.
|
|
files before exit
|
|
|
|
|
|
|
|
|
|
definitive service can be found
|
|
harvardartmuseums iiif manifest example url
|
|
a -bookdir flag to set download directory
|
|
|
|
|
|
|
|
versions of go
|
|
|
|
|
|
|
|
|
|
|
|
|
|
quality images with no visible watermark
|
|
|
|
|
|
|
|
|
|
- Ensure a final hyphen on the last word of a page isn't removed
- Only try to add a next word if there is one to take
- Ensure that if a single word is on the following line, which is
taken, then the line is blanked
|
|
|
|
duplicate (no way of knowing as if it is its downsized)
|
|
|
|
|