Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Other Things Gentoo
  • Search

How to convert PDF to TEX?

Still need help with Gentoo, and your question doesn't fit in the above forums? Here is your last bastion of hope.
Post Reply
Advanced search
10 posts • Page 1 of 1
Author
Message
nahpets
Veteran
Veteran
User avatar
Posts: 1178
Joined: Sun Oct 05, 2003 11:18 pm
Location: Montreal, Canada

How to convert PDF to TEX?

  • Quote

Post by nahpets » Sat Jul 10, 2004 7:42 pm

I have a PDF ebook that isn't well formatted. I want to extract all the text into a .tex file and generate a new, nicely formatted PDF document using Latex. I've tried doing this two ways:

First
- pdftotext to get a .txt file with all the text
- enscript to generate a rtf file
- unrtf to get an .tex file

This method doesn't really work because I loose all the formatting and breaks. I basically get one long paragraph in Latex.

Second
- pdftohtml generates one big HTML file with the proper formatting in HTML.
- I open the HTML in ooffice and "save as" .txt file
- enscript and unrtf like before to get a .tex file

The problem with this method is that ooffice sometimes doesn't want to open the whole document because I guess it's too big. Does anyone know of a better way to get what I want?
Top
boglin
n00b
n00b
User avatar
Posts: 67
Joined: Fri Jun 07, 2002 1:57 am
Location: Kingston, ON

html2latex

  • Quote

Post by boglin » Sat Jul 10, 2004 7:52 pm

what about tweaking the second method? Instead of using OO to convert html 2 txt then txt 2 latex, use html2latex.
Top
nahpets
Veteran
Veteran
User avatar
Posts: 1178
Joined: Sun Oct 05, 2003 11:18 pm
Location: Montreal, Canada

  • Quote

Post by nahpets » Sun Jul 11, 2004 10:01 am

what ebuild does "html2latex" belong to? I found "html2text"...
Top
boglin
n00b
n00b
User avatar
Posts: 67
Joined: Fri Jun 07, 2002 1:57 am
Location: Kingston, ON

I'm retarded

  • Quote

Post by boglin » Fri Jul 16, 2004 6:12 pm

ermm, my bad: html2latex DNE. Sorry. :oops:
Top
MacMasta
Guru
Guru
User avatar
Posts: 545
Joined: Thu Apr 18, 2002 5:29 am
Location: Anchorage, AK

  • Quote

Post by MacMasta » Sat Jul 17, 2004 12:03 am

You meant 'latex2html', which is most helpful.

I don't think there'll be much of a way to maintain the formatting; you'll need to do most of that yourself, once you get the paragraph...


~Mac~
Top
nahpets
Veteran
Veteran
User avatar
Posts: 1178
Joined: Sun Oct 05, 2003 11:18 pm
Location: Montreal, Canada

  • Quote

Post by nahpets » Sat Jul 17, 2004 6:55 am

Yeah, I figured as much. Doing it myself will take a long time, so I think I'll give up on it.
Top
machinelou
Apprentice
Apprentice
Posts: 267
Joined: Sat Apr 05, 2003 4:53 pm

  • Quote

Post by machinelou » Mon Jul 26, 2004 4:34 pm

Wouldn't it be reletively simple to write a bash or perl script to extract the text from the html file for your second method? That way you wouldn't need to use OpenOffice. Maybe there's a step or something I'm missing. Or, why not use pdftotext to extract

I just stumbled across this thread looking for someway to convert my pdf files into a more open format because I often find myself wishing it were possible to catalog and search through my collection of academic papers (which are currently stored on my disk as pdf files) without opening each one manually in acrobat or something. Pdftotext and pdftohtml just might be the ticket, but maybe a tool already exists to search a bunch of pdf files?
Top
nahpets
Veteran
Veteran
User avatar
Posts: 1178
Joined: Sun Oct 05, 2003 11:18 pm
Location: Montreal, Canada

  • Quote

Post by nahpets » Tue Jul 27, 2004 4:09 am

If you just want to search PDF files for keywords, I guess you can use pdftotext since formatting doesn't matter. In my case, I wanted to preserve the formating. The only tool I found which preserves the formatting was pdftohtml. I could have written a bash/python/c/java program to parse the HTML to give me text, but it wasn't worth the effort.
Top
slartibartfasz
Veteran
Veteran
User avatar
Posts: 1462
Joined: Tue Oct 29, 2002 10:27 pm
Location: Vienna, Austria

  • Quote

Post by slartibartfasz » Tue Jul 27, 2004 7:09 am

machinelou wrote:[...] but maybe a tool already exists to search a bunch of pdf files?
htdig (http://www.htdig.org) allows indexing of pdf files via pdftotext.
To an engineer the glass is neither half full, nor half empty - it is just twice as big as it needs to be.
Top
nahpets
Veteran
Veteran
User avatar
Posts: 1178
Joined: Sun Oct 05, 2003 11:18 pm
Location: Montreal, Canada

  • Quote

Post by nahpets » Tue Aug 03, 2004 6:24 am

machinelou wrote: I just stumbled across this thread looking for someway to convert my pdf files into a more open format because I often find myself wishing it were possible to catalog and search through my collection of academic papers (which are currently stored on my disk as pdf files) without opening each one manually in acrobat or something. Pdftotext and pdftohtml just might be the ticket, but maybe a tool already exists to search a bunch of pdf files?
this may be what you're looking for:
http://multivalent.sourceforge.net/
Top
Post Reply

10 posts • Page 1 of 1

Return to “Other Things Gentoo”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic