Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
UTF8 issue. Problems with Apache corrupting files. [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
biznatch
Apprentice
Apprentice


Joined: 23 Jul 2004
Posts: 220
Location: Wichita, KS

PostPosted: Fri Feb 03, 2006 3:34 pm    Post subject: UTF8 issue. Problems with Apache corrupting files. [SOLVED] Reply with quote

When I upload files via vsftp to my Gentoo/Apache server they are getting some strange symbols embedded into them. It looks like ASCII text garbage.

When I use firefox to view the site it displays the junk characters, but they do not show up in IE. When looking at the pages locally (before uploading) it looks fine in both browsers.

The junk shows up BEFORE any of my HTML data:

Code:
View source from firefox...
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>


Code:
# head index.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>



Code:
# file index.html
index.html: UTF-8 Unicode English text, with very long lines, with CRLF line terminators


Any idea whould could be going on here?

*** Update #1 ***

If I look at the file in VI it does not show the junk characters.

Code:
# vi index.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>

<head>


*** Update #2 ***

If I cat the file is shows the following...

Code:
<EF><BB><BF><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>

<head>


<EF><BB><BF> seems to be the Unicode Byte-Order Mark (BOM) for UTF-8. Now the only question is how did it get there and how do I remove it?
_________________
While your waiting for your post to be answered, please help with unanswered posts.


Last edited by biznatch on Sat Feb 18, 2006 2:50 pm; edited 2 times in total
Back to top
View user's profile Send private message
pivertd
Apprentice
Apprentice


Joined: 08 Feb 2004
Posts: 185
Location: Arlon, Belgium

PostPosted: Sat Feb 18, 2006 2:40 pm    Post subject: Reply with quote

Hello,

I have exactly the same problem.
This BOM tag seems to come when you edit your file with an editor that put the BOM tag.
The NotePad from Windows2003 seems to have this behavior (to confirm).

My only question is How to remove BOM tags ?
We can remove the BOM tag by removing the 3 first bytes of the file. So an alternate question is :
How can I remove the 3 first bytes from a file ?

Regards,
Back to top
View user's profile Send private message
biznatch
Apprentice
Apprentice


Joined: 23 Jul 2004
Posts: 220
Location: Wichita, KS

PostPosted: Sat Feb 18, 2006 2:48 pm    Post subject: Reply with quote

Are you using Front Page to edit your webpage? I found that you can go into the page properties and set the character encoding to "US/Western European ISO" to resolve the problem. Other HTML editors may support this as well.

Alternatively, you can modify your http.conf file to serve UTF-8 as default.

Code:
# cat /etc/apache2/httpd.conf |grep AddDefaultCharset
#AddDefaultCharset ISO-8859-1
AddDefaultCharset UTF-8

_________________
While your waiting for your post to be answered, please help with unanswered posts.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum