Gentoo Forums :: View topic - [Solved] tesseract ocr outputs empty files

[Solved] tesseract ocr outputs empty files

View unanswered posts
View posts from last 24 hours
View posts from last 7 days

Gentoo Forums Forum Index

Desktop Environments

View previous topic :: View next topic

Author

Message

The_Document
Apprentice

Joined: 03 Feb 2018
Posts: 275

Posted: Mon Mar 05, 2018 12:59 am Post subject: [Solved] tesseract ocr outputs empty files

I have a lot of docs to ocr and I choose tesseract, heres some info:

Code:

[ Legend : U + final flag setting for installation]
[ : I + package is installed with flag ]
[ Colors : set, unset ]
* Found these USE flags for app+text/tesseract+3.05.01:
U I
+ + doc : Add extra documentation (API, Javadoc, etc). It is recommended to enable per package instead of globally
+ + examples : Install examples, usually source code
+ + jpeg : Add JPEG image support
+ + l10n_ar : Arabic
+ + l10n_bg : Bulgarian
+ + l10n_ca : Catalan
+ + l10n_chr : Cherokee
+ + l10n_cs : Czech
+ + l10n_da : Danish
+ + l10n_de : German
+ + l10n_el : Modern Greek
+ + l10n_es : Spanish
+ + l10n_fi : Finnish
+ + l10n_fr : French
+ + l10n_he : Hebrew
+ + l10n_hi : Hindi
+ + l10n_hu : Hungarian
+ + l10n_id : Indonesian
+ + l10n_it : Italian
+ + l10n_ja : Japanese
+ + l10n_ko : Korean
+ + l10n_lt : Lithuanian
+ + l10n_lv : Latvian
+ + l10n_nl : Dutch
+ + l10n_no : Norwegian
+ + l10n_pl : Polish
+ + l10n_pt : Portuguese
+ + l10n_ro : Romanian
+ + l10n_ru : Russian
+ + l10n_sk : Slovak
+ + l10n_sl : Slovenian
+ + l10n_sr : Serbian
+ + l10n_sv : Swedish
+ + l10n_th : Thai
+ + l10n_tl : Tagalog
+ + l10n_tr : Turkish
+ + l10n_uk : Ukrainian
+ + l10n_vi : Vietnamese
+ + l10n_zh+CN : Chinese (China)
+ + l10n_zh+TW : Chinese (Taiwan)
+ + math : Enable support for recognition of equations.
+ + opencl : Enable opencl support for speedup using GPU computation.
+ + osd : Enable support orientation and script detection.
+ + png : Add support for libpng (PNG images)
+ + scrollview : Install viewer to debug recognition (ScrollView).
+ + static+libs : Build static versions of dynamic libraries as well
+ + tiff : Add support for the TIFF image format
+ + training : Install training applications to add support for new languages.
+ + webp : Enable support for webp image format.

I don't understand why it doesn't work, I run:

Code:

$ tesseract test.png out.txt
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:AMD CARRIZO (DRM 3.23.0 / 4.15.6-gentoo, LLVM 5.0.1) score is 0.595572
[DS] Device[2] 0:(null) score is 3.304872
[DS] Selected Device[1]: "AMD CARRIZO (DRM 3.23.0 / 4.15.6-gentoo, LLVM 5.0.1)" (OpenCL)
Tesseract Open Source OCR Engine v3.05.01 with Leptonica

it makes a file called out.txt and its empty, the ocr engine isn't working. How can I get it to work?

Last edited by The_Document on Mon Mar 05, 2018 7:09 am; edited 1 time in total

blopsalot
Apprentice

Joined: 28 Jan 2017
Posts: 231

Posted: Mon Mar 05, 2018 6:13 am Post subject:

i dont think it autodetects language.

Code:

tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]

The_Document
Apprentice

Joined: 03 Feb 2018
Posts: 275

Posted: Mon Mar 05, 2018 6:23 am Post subject:

blopsalot wrote:

i dont think it autodetects language.

Code:

tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]

I read somewhere the default is english if no language is specified and I was told to disable opencl support.
https://github.com/tesseract-ocr/tesseract/issues/1361

blopsalot
Apprentice

Joined: 28 Jan 2017
Posts: 231

Posted: Mon Mar 05, 2018 6:36 am Post subject:

makes sense, yeah i dont use it still.

The_Document
Apprentice

Joined: 03 Feb 2018
Posts: 275

Posted: Mon Mar 05, 2018 7:08 am Post subject:

Issue solved it works however after testing tesseract I found it a poor ocr engine, Im sure acrobat is more accurate.

Display posts from previous:

	Gentoo Forums Forum Index Desktop Environments	All times are GMT
Page 1 of 1

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Copyright 2001-2024 Gentoo Foundation, Inc. Designed by Kyle Manna © 2003; Style derived from original subSilver theme. | Hosting by Gossamer Threads Inc. © | Powered by phpBB 2.0.23-gentoo-p11 © 2001, 2002 phpBB Group
Privacy Policy