<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bookpipeline, branch v0.3.2</title>
<subtitle>Tools to process books in a cloud based pipeline system</subtitle>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/'/>
<entry>
<title>[rescribe] Fix up *.hocr glob, which ensures that using a savedir that already has a hocr directory in it will work</title>
<updated>2020-12-07T17:04:12+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-12-07T17:04:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=17b2d91d5f323fd985ca012e50d36908cbceba87'/>
<id>17b2d91d5f323fd985ca012e50d36908cbceba87</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[rescribe] Allow saving of results to somewhere other than a directory named after the book being processed</title>
<updated>2020-12-07T16:53:58+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-12-07T16:53:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=068ad0b666705a49ab22d7b48cd6a7d67b37f234'/>
<id>068ad0b666705a49ab22d7b48cd6a7d67b37f234</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Ensure mkdir will succeed in upload</title>
<updated>2020-12-04T17:12:59+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-12-04T17:12:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=4fcbfba65689dc5e8ad46ba467343d3da376d92a'/>
<id>4fcbfba65689dc5e8ad46ba467343d3da376d92a</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[rescribe] Fix portability issue where hocrs may not be correctly moved and txt-ified on windows</title>
<updated>2020-12-03T15:20:15+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-12-03T15:20:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=cbe02a57377787cd34172453a477f68f200448e8'/>
<id>cbe02a57377787cd34172453a477f68f200448e8</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Don't upload binarised pdf twice needlessly</title>
<updated>2020-12-03T15:16:14+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-12-03T15:13:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=38dbdd0b21fb363e3f63fd3ea50272975e98eb77'/>
<id>38dbdd0b21fb363e3f63fd3ea50272975e98eb77</id>
<content type='text'>
This can also result in the file being uploaded twice simultaneously,
as up() is running in a separate goroutine. This can cause failures
on Windows as the file is attempted to be removed by one upload
process while being open to upload by the other process. Probably it
could also fail if the process completed by one (so the file was
deleted) before being started by the other.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This can also result in the file being uploaded twice simultaneously,
as up() is running in a separate goroutine. This can cause failures
on Windows as the file is attempted to be removed by one upload
process while being open to upload by the other process. Probably it
could also fail if the process completed by one (so the file was
deleted) before being started by the other.
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' of ssh://hammerhead/home/nick/rescribe/src/bookpipeline</title>
<updated>2020-11-30T19:14:47+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-30T19:14:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=df364cbf93e4ab5b4db8b924b8396f4fb9caa149'/>
<id>df364cbf93e4ab5b4db8b924b8396f4fb9caa149</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Add getstats tool</title>
<updated>2020-11-30T19:13:53+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-30T19:13:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=190e095b04ce61041d16eb5d0109f5073b83f624'/>
<id>190e095b04ce61041d16eb5d0109f5073b83f624</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[booktopipeline] Add a check to disallow adding a book that already exists</title>
<updated>2020-11-24T12:40:54+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-24T12:40:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=0d914a5de3f8169d41df4fcff1ee4aea6d01afbe'/>
<id>0d914a5de3f8169d41df4fcff1ee4aea6d01afbe</id>
<content type='text'>
This is important as if a book is added which has already been done,
then an analyse job will be added every time a page is OCRed, which
will clog up the pipeline with unnecessary work. Also if a book was
added with the same name but differently named files, or a different
number of pages, the results would almost certainly not be as
intended.

In the case of a book really wanting to be added with a particular
name, either the original directory can be removed on S3, or "v2"
or similar can be appended to the book name before calling
booktopipeline.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is important as if a book is added which has already been done,
then an analyse job will be added every time a page is OCRed, which
will clog up the pipeline with unnecessary work. Also if a book was
added with the same name but differently named files, or a different
number of pages, the results would almost certainly not be as
intended.

In the case of a book really wanting to be added with a particular
name, either the original directory can be removed on S3, or "v2"
or similar can be appended to the book name before calling
booktopipeline.
</pre>
</div>
</content>
</entry>
<entry>
<title>Switch to a maintained version of gofpdf</title>
<updated>2020-11-18T15:19:28+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-18T15:19:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=0b9bd466dd2e099bf6c7d3165f1285f4b7a8f38e'/>
<id>0b9bd466dd2e099bf6c7d3165f1285f4b7a8f38e</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Describe rescribe tool in documentation</title>
<updated>2020-11-18T13:47:44+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-18T13:47:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=82ee93b53ae4fcef619543f643dc626f6c9353cf'/>
<id>82ee93b53ae4fcef619543f643dc626f6c9353cf</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
