Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[Solved] tesseract ocr outputs empty files
View unanswered posts
View posts from last 24 hours
View posts from last 7 days

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
The_Document
Apprentice
Apprentice


Joined: 03 Feb 2018
Posts: 275

PostPosted: Mon Mar 05, 2018 12:59 am    Post subject: [Solved] tesseract ocr outputs empty files Reply with quote

I have a lot of docs to ocr and I choose tesseract, heres some info:
Code:

[ Legend : U + final flag setting for installation]
[        : I + package is installed with flag     ]
[ Colors : set, unset                             ]
 * Found these USE flags for app+text/tesseract+3.05.01:
 U I
 + + doc         : Add extra documentation (API, Javadoc, etc). It is recommended to enable per package instead of globally
 + + examples    : Install examples, usually source code
 + + jpeg        : Add JPEG image support
 + + l10n_ar     : Arabic
 + + l10n_bg     : Bulgarian
 + + l10n_ca     : Catalan
 + + l10n_chr    : Cherokee
 + + l10n_cs     : Czech
 + + l10n_da     : Danish
 + + l10n_de     : German
 + + l10n_el     : Modern Greek
 + + l10n_es     : Spanish
 + + l10n_fi     : Finnish
 + + l10n_fr     : French
 + + l10n_he     : Hebrew
 + + l10n_hi     : Hindi
 + + l10n_hu     : Hungarian
 + + l10n_id     : Indonesian
 + + l10n_it     : Italian
 + + l10n_ja     : Japanese
 + + l10n_ko     : Korean
 + + l10n_lt     : Lithuanian
 + + l10n_lv     : Latvian
 + + l10n_nl     : Dutch
 + + l10n_no     : Norwegian
 + + l10n_pl     : Polish
 + + l10n_pt     : Portuguese
 + + l10n_ro     : Romanian
 + + l10n_ru     : Russian
 + + l10n_sk     : Slovak
 + + l10n_sl     : Slovenian
 + + l10n_sr     : Serbian
 + + l10n_sv     : Swedish
 + + l10n_th     : Thai
 + + l10n_tl     : Tagalog
 + + l10n_tr     : Turkish
 + + l10n_uk     : Ukrainian
 + + l10n_vi     : Vietnamese
 + + l10n_zh+CN  : Chinese (China)
 + + l10n_zh+TW  : Chinese (Taiwan)
 + + math        : Enable support for recognition of equations.
 + + opencl      : Enable opencl support for speedup using GPU computation.
 + + osd         : Enable support orientation and script detection.
 + + png         : Add support for libpng (PNG images)
 + + scrollview  : Install viewer to debug recognition (ScrollView).
 + + static+libs : Build static versions of dynamic libraries as well
 + + tiff        : Add support for the TIFF image format
 + + training    : Install training applications to add support for new languages.
 + + webp        : Enable support for webp image format.


I don't understand why it doesn't work, I run:

Code:
$ tesseract test.png out.txt
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:AMD CARRIZO (DRM 3.23.0 / 4.15.6-gentoo, LLVM 5.0.1) score is 0.595572
[DS] Device[2] 0:(null) score is 3.304872
[DS] Selected Device[1]: "AMD CARRIZO (DRM 3.23.0 / 4.15.6-gentoo, LLVM 5.0.1)" (OpenCL)
Tesseract Open Source OCR Engine v3.05.01 with Leptonica


it makes a file called out.txt and its empty, the ocr engine isn't working. How can I get it to work?


Last edited by The_Document on Mon Mar 05, 2018 7:09 am; edited 1 time in total
Back to top
View user's profile Send private message
blopsalot
Apprentice
Apprentice


Joined: 28 Jan 2017
Posts: 231

PostPosted: Mon Mar 05, 2018 6:13 am    Post subject: Reply with quote

i dont think it autodetects language.
Code:
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
Back to top
View user's profile Send private message
The_Document
Apprentice
Apprentice


Joined: 03 Feb 2018
Posts: 275

PostPosted: Mon Mar 05, 2018 6:23 am    Post subject: Reply with quote

blopsalot wrote:
i dont think it autodetects language.
Code:
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]


I read somewhere the default is english if no language is specified and I was told to disable opencl support.
https://github.com/tesseract-ocr/tesseract/issues/1361
Back to top
View user's profile Send private message
blopsalot
Apprentice
Apprentice


Joined: 28 Jan 2017
Posts: 231

PostPosted: Mon Mar 05, 2018 6:36 am    Post subject: Reply with quote

makes sense, yeah i dont use it still.
Back to top
View user's profile Send private message
The_Document
Apprentice
Apprentice


Joined: 03 Feb 2018
Posts: 275

PostPosted: Mon Mar 05, 2018 7:08 am    Post subject: Reply with quote

Issue solved it works however after testing tesseract I found it a poor ocr engine, Im sure acrobat is more accurate.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum