Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Simple solution to strange ֪ characters entities
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Forums Feedback
View previous topic :: View next topic  
Author Message
emiddleton
n00b
n00b


Joined: 22 Sep 2002
Posts: 8

PostPosted: Mon Jan 27, 2003 1:31 pm    Post subject: Simple solution to strange ֪ characters entities Reply with quote

Would it be possiable to filter the output with something like the following.

/&#/&#/

There is a bug somewhere in forumn's version of phpBB that is converting all & into & which distroys all character entities used to encode non-european languages when using the iso-8859-1 encoding. If you make this browsers change it is possible to see the correct characters in at least some of the browsers (obviously only if the correct fonts are installed.)

For more information about character entities look at.
http://www.w3.org/TR/2000/REC-xml-20001006#sec-references


Last edited by emiddleton on Wed Jan 29, 2003 2:36 pm; edited 2 times in total
Back to top
View user's profile Send private message
rac
Bodhisattva
Bodhisattva


Joined: 30 May 2002
Posts: 6553
Location: Japanifornia

PostPosted: Mon Jan 27, 2003 6:40 pm    Post subject: Reply with quote

In case anyone is wondering, the regex reads: /& amp;#/&#/ (disregard the space: I included it to defeat the feature being discussed). We need to study the issue a bit to make sure there are no unwanted side effects, but thanks a lot for the suggestion.

My current feeling is that this would break a lot of posts with '&' in them, and that's not acceptable. Do you have a suggestion for a way around this, or can you convince me I'm being silly?
_________________
For every higher wall, there is a taller ladder
Back to top
View user's profile Send private message
emiddleton
n00b
n00b


Joined: 22 Sep 2002
Posts: 8

PostPosted: Wed Jan 29, 2003 3:28 pm    Post subject: Reply with quote

Thanks for the response.

(All spaces in the following character entities are put in to stop the conversion)

Could you give an example of how, not converting & characters to & amp; could cause the & character to display incorrectly.

The reference I quoted above explains what character entities are. Basically any unicode character that can't be displayed in the current encoding (the forumn uses iso-8859-1, so anthing that isn't a european language) is converted into one of two possiable sequences.

' & # ' [0-9]+ ';' binary
' & # x' [0-9a-fA-F]+ ';' hexidecimal

e.g. & # 1502;

The number represents the numbers for characters in the ISO/IEC 10646 character set. Its really not all that difficult. These are not random sequences of letters and numbers and they may not contain spaces. You could also change the encoding to utf-8 (which encodes english characters as ASCII) which would cause these characters to be encoded without the use of character entities. If you go this way be carefull to encode the specify the encoding in the http headers as well.

The unicode standard page is at

http://www.unicode.org/standard/

(unicode uses the same character set as ISO/IEC10646)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Forums Feedback All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum