Ubuntu下安装使用

Ubuntu下安装使用

  1. convert a.jpg a.tif  

Tesseract安装,tesseract

【1】直接设置
1)Ubuntu 14.04下,能够一直设置发行李包裹tesseract-ocr
sudo apt-get install tesseract-ocr
如此设置的系统在/usr/bin下,数据文件在/usr/share/tesseract-ocr/tessdata下(已经设置了eng包)
在/usr/local/lib/python*.*/dist-package下有贰个文件夹pytesseract
(也许是本身非常的大心装上去的,GitHub[
pip install pytesseract安装),
那般就足以在Python中用tesseract了,例子如下:
import Image
import pytesseract
print
pytesseract.image_to_string(Image.open(‘./Test/Python/t2.png’))
print pytesseract.image_to_string(Image.open(‘./Test/Python/t2.png’),
lang=’eng’)
把自个儿操练好的数字样品文件num.traineddata拷贝到数据文件目录下
print pytesseract.image_to_string(Image.open(‘./Test/Python/t2.png’),
lang=’num’)
特别规的数字识别就很准了!
2)那样设置好的tesseract-ocr有一个主题素材,正是在Terminal下不能够使用tesseract命令解析,报如下错误(但Python中可用):
Tesseract Open Source OCR Engine v3.03 with Leptonica
Error in pixReadStreamPng: function not present
Error in pixReadStream: png: no pix returned
Error in pixRead: pix not read
Error in pixGetInputFormat: pix not defined
Reading ./Test/Python/t2.png as a list of filenames…
Error in fopenReadStream: file not found
Error in pixRead: image file not found: �PNG
Image file �PNG cannot be read!
Error during processing.
互连网说是因为Leptonica不认得png,tif,jpg格式(其实基本上什么格式都不认知,真不知道为啥还要依照那么些库?)

3.安装leptonica 

###不设置会有error: Leptonica 1.74 or higher is required. Try to
install libleptonica-dev package.错误

cd git

##从git少校leptonica项目克隆到地头

git clone    

cd leptonica

autoreconf -vi

./autobuild

./configure

make

sudo make install

你将会见到识别出的文字,太棒了。好了,上边就能够用程序去调用识别文字了,

(这些难题小编还尚无化解?????????????????)

【2】从源码安装
1)首先须要安装leptonica,下载地址:www.leptonica.org/download.html,例如下载leptonica-1.68.tar.gz
下一场安装,使用如下的大旨安装格局就足以了(leptonica的定制安装风乐趣的再弄呢):
./configure         [build the Makefile]
make                [builds the library and shared library versions of
all the progs]
sudo make install   [as root; this puts liblept.a into /usr/local/lib/
and all the progs into /usr/local/bin/ ]
2)下载Tesseract,现在Tesseract托管到GitHub了(
从GitHub下载代码,解压缩到有些目录(比方/tmp/tesseract)
3)安装
./autogen.sh
./configure
make
sudo make install
sudo ldconfig
注意这样设置好的系列在/usr/local/bin下,数据文件在/usr/local/share/tessdata下!
里头大概会有如下错误:
[1]./autogen.sh时,报错一群众工作具未有,则须求补齐相应工具:
没有aclocal        sudo apt-get install automake
没有libtoolize     sudo apt-get install libtool
固然再报未有别的工具,则进行那一个工具,Ubuntu会报告你哪些设置它。
[2]数据难点
源码make出来的系统是不曾数据的,必需最少安装贰个数据包(平常是eng)本领运转系统,安装情势:
先下载数据包,然后解压缩到/usr/local/share/tessdata
[3]测量试验是不是安装成功
先测验系统安装,运转tesseract,出现以下内容表明安装成功!
[email protected]:/usr/local/share/tessdata$
tesseract
Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode]
[configfile…]

pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.

Single options:
  -v –version: version info
  –list-langs: list available languages for tesseract engine
周围错误是未曾言语数据,如下,那是内需遵照前边说的安装好语言数据(最棒装上eng,系统暗中认可是eng,而且eng鲜明用得上):
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the
parent directory of your “tessdata” directory.
Failed loading language ‘eng’
Tesseract couldn’t load any languages!
Could not initialize tesseract.
然后测验文件识别,源码目录下有个phototest.tif文件,能够充当测验用。
tesseract phototest.tif test1 -l eng
常见错误是Leptonica不宽容,如下:
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Error in findTiffCompression: function not present
Error in pixReadStreamTiff: function not present
Error in pixReadStream: tiff: no pix returned
Error in pixRead: pix not read
Unsupported image type.
本条标题本人还平昔不解决,英特网说的形式拾贰分(在Ubuntu
14.04上没试通)????????????????????????????????

【1】直接设置 1)Ubuntu
14.04下,能够直接设置发行李包裹tesseract-ocr sudo apt-get install
tesseract-ocr 这样设置的体系在/usr/bin下,…

2.后生可畏旦需求操练需安装以下库

sudo apt-get install libicu-dev

sudo apt-get install libpango1.0-dev

sudo apt-get install libcairo2-dev

好了,测量检验一下呢

1.安装依赖的库

sudo apt-get install g++ # or clang++ (presumably)

sudo apt-get install autoconf automake libtool

sudo apt-get install autoconf-archive

sudo apt-get install pkg-config

sudo apt-get install libpng12-dev

sudo apt-get install libjpeg8-dev

sudo apt-get install libtiff5-dev

sudo apt-get install zlib1g-dev

尽管您越过 lib **.so 找不到请运行

4.安装tesseract

cd git

git clone

cd tesseract

./autogen.sh

./configure –enable-debug

make

sudo make install

设置收尾。

接纳 tesseract -v 命令能够查阅安装是或不是到位(突显版本号)

tesseract官英特网有已经练习好的字典,能够下载下来使用。

地址:https://github.com/tesseract-ocr/tessdata

增添tessdata文件夹路线到遭逢变量中:export
TESSDATA_PREFIX=/你的路径/tessdata

 例如 export TESSDATA_PREFIX=/usr/local/share/tessdata

将下载的字典放到tessdata文件夹里

 即放到/usr/local/share/tessdata中去

利用tesseract  –list-langs
命令能够知晓当前已部分字典,在用tesseract识别文字前须求求加载字典

分辨文件的指令:tesseract filename output -l lang

例如:tesseract chi.font.exp3.tif output -l chi_sim 

tesseract-ocr是开源的光学字符识别引擎,有Google的协理,扶植非常多样语言的辨识,下边说一下
笔者在Ubuntu下安装步骤

下载言语数据包解压:你能够在此找到越多的语言包

  1. gzip -d eng.traineddata.gz  

图片 1

admin

网站地图xml地图