diff options
author | Nick White <git@njw.name> | 2021-06-29 12:57:45 +0100 |
---|---|---|
committer | Nick White <git@njw.name> | 2021-06-29 12:57:45 +0100 |
commit | 690181056991c0c5ecea3de019581da0243d7635 (patch) | |
tree | 0741c9a0fec12998f2ddece74fe25383d5864c57 /cmd | |
parent | a798db15135d56205bbe3660741b53837094914b (diff) |
rescribe: add documentation on how to generate embedded data
Diffstat (limited to 'cmd')
-rw-r--r-- | cmd/rescribe/EMBEDDING_NOTES.md | 29 |
1 files changed, 29 insertions, 0 deletions
diff --git a/cmd/rescribe/EMBEDDING_NOTES.md b/cmd/rescribe/EMBEDDING_NOTES.md new file mode 100644 index 0000000..930f2d7 --- /dev/null +++ b/cmd/rescribe/EMBEDDING_NOTES.md @@ -0,0 +1,29 @@ +The embedded copies of Tesseract are fetched by `go generate` from +copies that are stored online. + +To create them yourself, you need to create a .zip file that contains +the tesseract executable, plus any libraries that are needed for it +to run. + +It must be linked so that these libraries are accessed from the same +directory as the executable. On Windows this is the default +behaviour. On Linux we just create a static binary using the +simplemake branch of Tesseract, available at +https://github.com/nickjwhite/tesseract + +On OSX it's a bit more complicated. We install Tesseract on a host +machine using homebrew, then copy the binary, and run +`otool -L tesseract` to find the libraries that need to be copied +as well. Then `otool -L libname.dylib` needs to be run for each +library to find all non-system libraries they depend on, to copy. +Once that is done, `install_name_tool` needs to be run on the +binary and libraries to set the lookup path to the local directory, +like this: + install_name_tool -change /usr/local/opt/libpng/lib/libpng16.16.dylib @executable_path/libpng16.16.dylib liblept.5.dylib +You can find the path names to replace using `otool -L`. +This is all taken from a great guide on how to do this: +http://thecourtsofchaos.com/2013/09/16/how-to-copy-and-relink-binaries-on-osx/ + +The embedded tessdata is much easier to create, it's just a +standard tessdata from an install on any platform, plus any +additional .traineddata files you want to include. |