View previous topic :: View next topic |
Author |
Message |
brent_weaver Guru
Joined: 01 Jul 2004 Posts: 510 Location: Burlington, VT
|
Posted: Fri May 03, 2013 1:01 pm Post subject: Removal of escape sequence in file |
|
|
I am trying to use an HTML file that was generated by MS Word... (I know... I know lol...) There is a strange esc seauence chat that I am trying to figure out what it is and then how to remove it from the file...
The hex value is
Code: |
nagios@usmke1nagvm01l # hexdump tmpp
0000000 0da0 000a
0000003
|
How do I get rid of this character? _________________ Brent Weaver |
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10587 Location: Somewhere over Atlanta, Georgia
|
Posted: Fri May 03, 2013 1:08 pm Post subject: |
|
|
hexdump by default outputs words, not bytes. Provide a slightly more readable dump, please, and, perhaps, a somewhat larger example.- John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
brent_weaver Guru
Joined: 01 Jul 2004 Posts: 510 Location: Burlington, VT
|
Posted: Fri May 03, 2013 1:52 pm Post subject: |
|
|
nagios@usmke1nagvm01l # od -tx1c -Ax tmpp
000000 a0 0d 0a
240 \r \n
000003
nagios@usmke1nagvm01l # _________________ Brent Weaver |
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10587 Location: Somewhere over Atlanta, Georgia
|
Posted: Fri May 03, 2013 2:11 pm Post subject: |
|
|
Well, the 0x0D 0x0A is just the DOS/Windows line end convention. You can convert that to Unix/Linux line end conventions with app-text/dos2unix. As far as the 0xA0, that's a Unicode non-break-space character; it's probably there on purpose. In other words, I think you should leave it alone.
- John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
brent_weaver Guru
Joined: 01 Jul 2004 Posts: 510 Location: Burlington, VT
|
Posted: Fri May 03, 2013 2:45 pm Post subject: |
|
|
thanks for the information. This actually made it worse. Not the square boxes (as presented by apache) and triagles. _________________ Brent Weaver |
|
Back to top |
|
|
John R. Graham Administrator
Joined: 08 Mar 2005 Posts: 10587 Location: Somewhere over Atlanta, Georgia
|
Posted: Fri May 03, 2013 3:01 pm Post subject: |
|
|
Okay. A longer example would be helpful.
- John _________________ I can confirm that I have received between 0 and 499 National Security Letters. |
|
Back to top |
|
|
cwr Veteran
Joined: 17 Dec 2005 Posts: 1969
|
Posted: Fri May 03, 2013 3:02 pm Post subject: |
|
|
The Nano editor converts unix <=> MSDOS <=> Mac text files; just
load the file and then save in the new format. For stripping out odd control
characters the 'tr' utility is pretty handy.
Will |
|
Back to top |
|
|
brent_weaver Guru
Joined: 01 Jul 2004 Posts: 510 Location: Burlington, VT
|
Posted: Fri May 03, 2013 3:11 pm Post subject: |
|
|
Will -
I need to figure out what the esc sequences are first. How do I do that? _________________ Brent Weaver |
|
Back to top |
|
|
cwr Veteran
Joined: 17 Dec 2005 Posts: 1969
|
Posted: Sun May 05, 2013 3:14 pm Post subject: |
|
|
od -x gives you file dump in hex format; or the editor bvi gives you a useful overview if you
understand VI commands. Unfortunately bvi can edit a file, but can't do a global search/replace.
Once you have an idea which escapes you want stripped out, then tr is your fiend.
Or you can write a simple filter in C or Python.
Will |
|
Back to top |
|
|
|