View previous topic :: View next topic |
Author |
Message |
The_Document Apprentice
Joined: 03 Feb 2018 Posts: 275
|
Posted: Mon Mar 05, 2018 12:59 am Post subject: [Solved] tesseract ocr outputs empty files |
|
|
I have a lot of docs to ocr and I choose tesseract, heres some info:
Code: |
[ Legend : U + final flag setting for installation]
[ : I + package is installed with flag ]
[ Colors : set, unset ]
* Found these USE flags for app+text/tesseract+3.05.01:
U I
+ + doc : Add extra documentation (API, Javadoc, etc). It is recommended to enable per package instead of globally
+ + examples : Install examples, usually source code
+ + jpeg : Add JPEG image support
+ + l10n_ar : Arabic
+ + l10n_bg : Bulgarian
+ + l10n_ca : Catalan
+ + l10n_chr : Cherokee
+ + l10n_cs : Czech
+ + l10n_da : Danish
+ + l10n_de : German
+ + l10n_el : Modern Greek
+ + l10n_es : Spanish
+ + l10n_fi : Finnish
+ + l10n_fr : French
+ + l10n_he : Hebrew
+ + l10n_hi : Hindi
+ + l10n_hu : Hungarian
+ + l10n_id : Indonesian
+ + l10n_it : Italian
+ + l10n_ja : Japanese
+ + l10n_ko : Korean
+ + l10n_lt : Lithuanian
+ + l10n_lv : Latvian
+ + l10n_nl : Dutch
+ + l10n_no : Norwegian
+ + l10n_pl : Polish
+ + l10n_pt : Portuguese
+ + l10n_ro : Romanian
+ + l10n_ru : Russian
+ + l10n_sk : Slovak
+ + l10n_sl : Slovenian
+ + l10n_sr : Serbian
+ + l10n_sv : Swedish
+ + l10n_th : Thai
+ + l10n_tl : Tagalog
+ + l10n_tr : Turkish
+ + l10n_uk : Ukrainian
+ + l10n_vi : Vietnamese
+ + l10n_zh+CN : Chinese (China)
+ + l10n_zh+TW : Chinese (Taiwan)
+ + math : Enable support for recognition of equations.
+ + opencl : Enable opencl support for speedup using GPU computation.
+ + osd : Enable support orientation and script detection.
+ + png : Add support for libpng (PNG images)
+ + scrollview : Install viewer to debug recognition (ScrollView).
+ + static+libs : Build static versions of dynamic libraries as well
+ + tiff : Add support for the TIFF image format
+ + training : Install training applications to add support for new languages.
+ + webp : Enable support for webp image format.
|
I don't understand why it doesn't work, I run:
Code: | $ tesseract test.png out.txt
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:AMD CARRIZO (DRM 3.23.0 / 4.15.6-gentoo, LLVM 5.0.1) score is 0.595572
[DS] Device[2] 0:(null) score is 3.304872
[DS] Selected Device[1]: "AMD CARRIZO (DRM 3.23.0 / 4.15.6-gentoo, LLVM 5.0.1)" (OpenCL)
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
|
it makes a file called out.txt and its empty, the ocr engine isn't working. How can I get it to work?
Last edited by The_Document on Mon Mar 05, 2018 7:09 am; edited 1 time in total |
|
Back to top |
|
|
blopsalot Apprentice
Joined: 28 Jan 2017 Posts: 231
|
Posted: Mon Mar 05, 2018 6:13 am Post subject: |
|
|
i dont think it autodetects language.
Code: | tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...] |
|
|
Back to top |
|
|
The_Document Apprentice
Joined: 03 Feb 2018 Posts: 275
|
Posted: Mon Mar 05, 2018 6:23 am Post subject: |
|
|
blopsalot wrote: | i dont think it autodetects language.
Code: | tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...] |
|
I read somewhere the default is english if no language is specified and I was told to disable opencl support.
https://github.com/tesseract-ocr/tesseract/issues/1361 |
|
Back to top |
|
|
blopsalot Apprentice
Joined: 28 Jan 2017 Posts: 231
|
Posted: Mon Mar 05, 2018 6:36 am Post subject: |
|
|
makes sense, yeah i dont use it still. |
|
Back to top |
|
|
The_Document Apprentice
Joined: 03 Feb 2018 Posts: 275
|
Posted: Mon Mar 05, 2018 7:08 am Post subject: |
|
|
Issue solved it works however after testing tesseract I found it a poor ocr engine, Im sure acrobat is more accurate. |
|
Back to top |
|
|
|