<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bookpipeline/cmd, branch v0.3.2</title>
<subtitle>Tools to process books in a cloud based pipeline system</subtitle>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/'/>
<entry>
<title>[rescribe] Fix up *.hocr glob, which ensures that using a savedir that already has a hocr directory in it will work</title>
<updated>2020-12-07T17:04:12+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-12-07T17:04:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=17b2d91d5f323fd985ca012e50d36908cbceba87'/>
<id>17b2d91d5f323fd985ca012e50d36908cbceba87</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[rescribe] Allow saving of results to somewhere other than a directory named after the book being processed</title>
<updated>2020-12-07T16:53:58+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-12-07T16:53:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=068ad0b666705a49ab22d7b48cd6a7d67b37f234'/>
<id>068ad0b666705a49ab22d7b48cd6a7d67b37f234</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[rescribe] Fix portability issue where hocrs may not be correctly moved and txt-ified on windows</title>
<updated>2020-12-03T15:20:15+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-12-03T15:20:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=cbe02a57377787cd34172453a477f68f200448e8'/>
<id>cbe02a57377787cd34172453a477f68f200448e8</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' of ssh://hammerhead/home/nick/rescribe/src/bookpipeline</title>
<updated>2020-11-30T19:14:47+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-30T19:14:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=df364cbf93e4ab5b4db8b924b8396f4fb9caa149'/>
<id>df364cbf93e4ab5b4db8b924b8396f4fb9caa149</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Add getstats tool</title>
<updated>2020-11-30T19:13:53+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-30T19:13:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=190e095b04ce61041d16eb5d0109f5073b83f624'/>
<id>190e095b04ce61041d16eb5d0109f5073b83f624</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[booktopipeline] Add a check to disallow adding a book that already exists</title>
<updated>2020-11-24T12:40:54+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-24T12:40:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=0d914a5de3f8169d41df4fcff1ee4aea6d01afbe'/>
<id>0d914a5de3f8169d41df4fcff1ee4aea6d01afbe</id>
<content type='text'>
This is important as if a book is added which has already been done,
then an analyse job will be added every time a page is OCRed, which
will clog up the pipeline with unnecessary work. Also if a book was
added with the same name but differently named files, or a different
number of pages, the results would almost certainly not be as
intended.

In the case of a book really wanting to be added with a particular
name, either the original directory can be removed on S3, or "v2"
or similar can be appended to the book name before calling
booktopipeline.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is important as if a book is added which has already been done,
then an analyse job will be added every time a page is OCRed, which
will clog up the pipeline with unnecessary work. Also if a book was
added with the same name but differently named files, or a different
number of pages, the results would almost certainly not be as
intended.

In the case of a book really wanting to be added with a particular
name, either the original directory can be removed on S3, or "v2"
or similar can be appended to the book name before calling
booktopipeline.
</pre>
</div>
</content>
</entry>
<entry>
<title>Add trimqueue and logwholequeue utilities which can help deal with weird queue states</title>
<updated>2020-11-17T16:37:54+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-17T16:37:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=2717c5ed21a082a7f24833f3d57b303fd22bd4e5'/>
<id>2717c5ed21a082a7f24833f3d57b303fd22bd4e5</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove _bin0.x from txt filenames</title>
<updated>2020-11-17T12:24:42+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-17T12:24:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=f71fd636f151e5cb7eafb2ae6c21c1c188d43fdd'/>
<id>f71fd636f151e5cb7eafb2ae6c21c1c188d43fdd</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[rescribe] Default to an appropriate tesscmd for Windows</title>
<updated>2020-11-16T17:43:13+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-16T17:43:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=eefa8f50d7ab915ce426c837cf504d26b7d4ccee'/>
<id>eefa8f50d7ab915ce426c837cf504d26b7d4ccee</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>[rescribe] Add txt output, only keep colour pdf, and reorganise files so they're more user-friendly</title>
<updated>2020-11-16T16:44:42+00:00</updated>
<author>
<name>Nick White</name>
<email>git@njw.name</email>
</author>
<published>2020-11-16T16:44:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.rescribe.xyz/cgit/cgit.cgi/bookpipeline/commit/?id=56c1cf041aec9cb2352a3bd4a4b46e65a3cc04c0'/>
<id>56c1cf041aec9cb2352a3bd4a4b46e65a3cc04c0</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
