<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bookpipeline, branch v0.5.3</title>
<subtitle>Tools to process books in a cloud based pipeline system</subtitle>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/'/>
<entry>
<title>rescribe: fix lookup of external training file</title>
<updated>2021-10-12T11:04:36+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-10-12T11:04:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=ebf623c447d4fe73801242262d4b584920235920'/>
<id>ebf623c447d4fe73801242262d4b584920235920</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>rescribe: Include new tessdata in embed getter</title>
<updated>2021-10-01T11:42:30+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-10-01T11:42:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=ff63756c71bf0d10484977bf6d08317c4782ec5e'/>
<id>ff63756c71bf0d10484977bf6d08317c4782ec5e</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>rescribe: Add embedded lat.traineddata</title>
<updated>2021-10-01T11:41:00+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-10-01T11:41:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=d855798edd597492ce54b2b70851718fb87d225d'/>
<id>d855798edd597492ce54b2b70851718fb87d225d</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>rescribe: Add both original training path and embedded version on error output for training file not found, so that its clear that the file specified may not exist</title>
<updated>2021-10-01T11:40:41+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-10-01T11:40:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=bfe749b06a9f94d10156dbc6eb58b2276ac2267c'/>
<id>bfe749b06a9f94d10156dbc6eb58b2276ac2267c</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>pdf: Always encode images as jpeg</title>
<updated>2021-08-30T12:07:43+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-08-30T12:07:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=37a1581288447ca63412047fec0fb081043ba6fb'/>
<id>37a1581288447ca63412047fec0fb081043ba6fb</id>
<content type='text'>
Previously for PDFs using binarised images we kept them as PNG, but
there's no good reason to do so, it's better to just get the space
savings on offer from jpeg.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously for PDFs using binarised images we kept them as PNG, but
there's no good reason to do so, it's better to just get the space
savings on offer from jpeg.
</pre>
</div>
</content>
</entry>
<entry>
<title>adjusted the height of the image in the pdf to 1000px if the smaller option is chosen</title>
<updated>2021-08-30T11:49:23+00:00</updated>
<author>
<name>Antonia Rescribe</name>
<email>antonia@rescribe.xyz</email>
</author>
<published>2021-08-30T11:49:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=7bdfbedde14a21382440fbba8e65dc139c46b9f2'/>
<id>7bdfbedde14a21382440fbba8e65dc139c46b9f2</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>rescribe: improve makefile to match the way we deploy to the website</title>
<updated>2021-08-24T16:04:45+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-08-24T16:04:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=eea92760c9f9f2fa108cf759f2b4ca17b57e8364'/>
<id>eea92760c9f9f2fa108cf759f2b4ca17b57e8364</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>lspipeline-ng: Limit number of book details requests so we don't run into EC2's rate limiting</title>
<updated>2021-08-19T16:50:11+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-08-19T16:50:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=9f3fec3e0982c5b419338f68428f12bbeed4c2bb'/>
<id>9f3fec3e0982c5b419338f68428f12bbeed4c2bb</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>rescribe: Update documentation on how to deal with M1 signing, and move makefile to where it makes sense</title>
<updated>2021-08-18T21:37:40+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-08-18T21:37:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=bde651bbde9df3a8c33b705dbe33bbcaf4e3e73d'/>
<id>bde651bbde9df3a8c33b705dbe33bbcaf4e3e73d</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>pdf: Stretch words to fit in their boxes, for more perfect embedding</title>
<updated>2021-08-17T12:39:09+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2021-08-17T12:39:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=767b60db23311adaf1035e821bc189877d63b7f0'/>
<id>767b60db23311adaf1035e821bc189877d63b7f0</id>
<content type='text'>
- Words are stretched to fit their boxes, which means the accuracy
  is now very high indeed. This was done by modifying gofpdf to add
  the SetCellStretchToFit function, which will hopefully be
  upstreamed in due course.
- Copy pasting from a PDF works well with lines rarely if ever being
  erroneously broken by the PDF reader. There was quite a bit of
  trial-and-error to improve this, and the stretched text plus a space
  being added after the word in CellFormat was the best (plus preserves
  accuracy of word and character locations).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Words are stretched to fit their boxes, which means the accuracy
  is now very high indeed. This was done by modifying gofpdf to add
  the SetCellStretchToFit function, which will hopefully be
  upstreamed in due course.
- Copy pasting from a PDF works well with lines rarely if ever being
  erroneously broken by the PDF reader. There was quite a bit of
  trial-and-error to improve this, and the stretched text plus a space
  being added after the word in CellFormat was the best (plus preserves
  accuracy of word and character locations).
</pre>
</div>
</content>
</entry>
</feed>
