Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Removal of escape sequence in file
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
brent_weaver
Guru
Guru


Joined: 01 Jul 2004
Posts: 510
Location: Burlington, VT

PostPosted: Fri May 03, 2013 1:01 pm    Post subject: Removal of escape sequence in file Reply with quote

I am trying to use an HTML file that was generated by MS Word... (I know... I know lol...) There is a strange esc seauence chat that I am trying to figure out what it is and then how to remove it from the file...

The hex value is
Code:

nagios@usmke1nagvm01l # hexdump tmpp
0000000 0da0 000a
0000003


How do I get rid of this character?
_________________
Brent Weaver
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri May 03, 2013 1:08 pm    Post subject: Reply with quote

hexdump by default outputs words, not bytes. Provide a slightly more readable dump, please, and, perhaps, a somewhat larger example.
Code:
od -tx1c -Ax tmpp
- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
brent_weaver
Guru
Guru


Joined: 01 Jul 2004
Posts: 510
Location: Burlington, VT

PostPosted: Fri May 03, 2013 1:52 pm    Post subject: Reply with quote

nagios@usmke1nagvm01l # od -tx1c -Ax tmpp
000000 a0 0d 0a
240 \r \n
000003
nagios@usmke1nagvm01l #
_________________
Brent Weaver
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri May 03, 2013 2:11 pm    Post subject: Reply with quote

Well, the 0x0D 0x0A is just the DOS/Windows line end convention. You can convert that to Unix/Linux line end conventions with app-text/dos2unix. As far as the 0xA0, that's a Unicode non-break-space character; it's probably there on purpose. In other words, I think you should leave it alone.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
brent_weaver
Guru
Guru


Joined: 01 Jul 2004
Posts: 510
Location: Burlington, VT

PostPosted: Fri May 03, 2013 2:45 pm    Post subject: Reply with quote

thanks for the information. This actually made it worse. Not the square boxes (as presented by apache) and triagles.
_________________
Brent Weaver
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Fri May 03, 2013 3:01 pm    Post subject: Reply with quote

Okay. A longer example would be helpful.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
cwr
Veteran
Veteran


Joined: 17 Dec 2005
Posts: 1969

PostPosted: Fri May 03, 2013 3:02 pm    Post subject: Reply with quote

The Nano editor converts unix <=> MSDOS <=> Mac text files; just
load the file and then save in the new format. For stripping out odd control
characters the 'tr' utility is pretty handy.

Will
Back to top
View user's profile Send private message
brent_weaver
Guru
Guru


Joined: 01 Jul 2004
Posts: 510
Location: Burlington, VT

PostPosted: Fri May 03, 2013 3:11 pm    Post subject: Reply with quote

Will -

I need to figure out what the esc sequences are first. How do I do that?
_________________
Brent Weaver
Back to top
View user's profile Send private message
cwr
Veteran
Veteran


Joined: 17 Dec 2005
Posts: 1969

PostPosted: Sun May 05, 2013 3:14 pm    Post subject: Reply with quote

od -x gives you file dump in hex format; or the editor bvi gives you a useful overview if you
understand VI commands. Unfortunately bvi can edit a file, but can't do a global search/replace.
Once you have an idea which escapes you want stripped out, then tr is your fiend.

Or you can write a simple filter in C or Python.

Will
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum