View previous topic :: View next topic |
Author |
Message |
D-LINC Tux's lil' helper
Joined: 31 Jan 2011 Posts: 135 Location: Alaska
|
Posted: Wed Jan 02, 2013 3:19 am Post subject: batch convert pdf to text |
|
|
Hi. Is there a tool in Portage for batch conversion of pdf to text? The only thing I have been able find searching with eix is pstotext, but I'm not willing to use it because it isn't free software (PSTT license). I thought libreoffice was supposed to do batch conversion from the command line (with the --convert-to flag) but it hangs every time I try. _________________ frigidcode.com |
|
Back to top |
|
|
BillWho Veteran
Joined: 03 Mar 2012 Posts: 1600 Location: US
|
Posted: Wed Jan 02, 2013 5:19 am Post subject: |
|
|
D-LINC,
Did you look into pdftotext _________________ Good luck
Since installing gentoo, my life has become one long emerge |
|
Back to top |
|
|
frostschutz Advocate
Joined: 22 Feb 2005 Posts: 2977 Location: Germany
|
Posted: Wed Jan 02, 2013 5:41 am Post subject: |
|
|
I've had some good results with pdftohtml / pdftotext (poppler) but never without manually adapting the output afterward. PDF is just not suitable for conversions like this, it's more an image format than a text really. If the PDF is not overwrought with images you may even get better results using an OCR software.
For batch processing and conversion into various text/book formats you could have a peek at calibre. I never get any good results for PDF with it (it also uses pdftohtml, but it's harder to adapt to custom needs), but some people have more luck with it than I. |
|
Back to top |
|
|
|