Gentoo forum's default codification: JAPANESE?????????

Message

Deathwing00 · Post by **Deathwing00** » Wed Nov 12, 2003 1:27 pm

I don't know what is happening to the forum but my 'automatic' enconding detects Japanese as the default forum's encoding. Now each time I get into the forum (I use the spanish language pack and write in greek) I see squares instead of accented letters, the euro symbol as a box and so on, because of this. I am forced to manually change the codification each time I move to another section inside the forum. Could you check that plz?

rac · Post by **rac** » Wed Nov 12, 2003 6:08 pm

There is no default encoding for the forums. In order to allow forums that cannot be encoded in ISO-8859-1 (specifically the Russian forum), we took off the character set META tag that was being sent. Can you try telling your browser to use ISO-8859-1 when browsing the forums? That should be equivalent to what was happening before with the META tag.

Deathwing00 · Post by **Deathwing00** » Wed Nov 12, 2003 7:33 pm

rac wrote:There is no default encoding for the forums. In order to allow forums that cannot be encoded in ISO-8859-1 (specifically the Russian forum), we took off the character set META tag that was being sent. Can you try telling your browser to use ISO-8859-1 when browsing the forums? That should be equivalent to what was happening before with the META tag.

I suppose I can disable the 'automatic'

Thanks for your reply.

Zeitgeist · Post by **Zeitgeist** » Sat Nov 15, 2003 6:38 am

それは実際に大きい、私日本語を話すことを愛する

Deathwing00 · Post by **Deathwing00** » Sat Nov 15, 2003 11:02 am

Zeitgeist wrote:それは実際に大きい、私日本語を話すことを愛する

Now you can translate...

rvalles · Post by **rvalles** » Mon Nov 17, 2003 2:51 pm

Why don't we just send UTF8 encoding?

Unicode is a encoding that tries to have all characters from all languages. It doesn't yet, but its current character base is quite big. Also, it's got wide support in webbrowsers nowadays (At least the major ones (gecko, khtml and IE) don't have a problem at all with it). If we use anything else, then we're always rendering a colective of people unable to speak using their language specific characters.

More into at:
http://www.unicode.org

A example of some phpbb2 forums that do use Unicode:
http://phpbb2.oceanbreeze.co.jp/

zhenlin · Post by **zhenlin** » Mon Nov 17, 2003 3:45 pm

I believe that mosts international language posts here either use ISO8859-1[5], UTF-8 or XML escapes.

The only exception being the Russian forum, which has decided to break from the order and use CP1251, making for extremely ugly topic titles when not viewed in that encoding.

Zeitgeist's post is XML-escaped, I believe.

Deathwing00 · Post by **Deathwing00** » Mon Nov 17, 2003 3:49 pm

zhenlin wrote:I believe that mosts international language posts here either use ISO8859-1[5], UTF-8 or XML escapes.

The only exception being the Russian forum, which has decided to break from the order and use CP1251, making for extremely ugly topic titles when not viewed in that encoding.

Zeitgeist's post is XML-escaped, I believe.

ISO8859-7

rvalles · Post by **rvalles** » Tue Dec 02, 2003 3:14 am

UTF8 is definitivelly the way to go.

zhenlin · Post by **zhenlin** » Wed Dec 03, 2003 10:20 am

Not if you're looking at >33% Greek (or Cyrillic, or any other character in the 0x80 and 0x07FF range) characters. Why? Because these take up 2 bytes each. So... In the case where 33% of all characters are in the 0x80 to 0x07FF range, the ratio between those and ASCII characters will be 1:1, an acceptable number. If it becomes greater than 1:1, it begins to make more sense to use an encoding that can effectively encode those characters in one byte.

When you start looking at CJK characters, you see 3 bytes per character... Where as UTF-16 will only have 2 bytes per character.

As you can see, there are trade-offs. UTF-8 is suitable for an majority-ASCII datascape... But how long is that going to last?

Maybe someone will develop a more effective way of encoding Unicode... Perhaps, a system with codepage selectors? Ack... That sounds nasty, and stupid, plus difficult to implement.

rvalles · Post by **rvalles** » Wed Dec 03, 2003 4:33 pm

zhenlin wrote:Not if you're looking at >33% Greek (or Cyrillic, or any other character in the 0x80 and 0x07FF range) characters. Why? Because these take up 2 bytes each. So... In the case where 33% of all characters are in the 0x80 to 0x07FF range, the ratio between those and ASCII characters will be 1:1, an acceptable number. If it becomes greater than 1:1, it begins to make more sense to use an encoding that can effectively encode those characters in one byte.

When you start looking at CJK characters, you see 3 bytes per character... Where as UTF-16 will only have 2 bytes per character.

As you can see, there are trade-offs. UTF-8 is suitable for an majority-ASCII datascape... But how long is that going to last?

Maybe someone will develop a more effective way of encoding Unicode... Perhaps, a system with codepage selectors? Ack... That sounds nasty, and stupid, plus difficult to implement.

That's right. But let's use what solves the problem now.
(utf8, mod_gzip, etc.)
Using codepages isn't even an option, because we do use many characters and many languages. There are even characters in unicode for the international system units (like a Hz character), math stuff, etc. that can be really useful.

zhenlin · Post by **zhenlin** » Thu Dec 04, 2003 8:48 am

Codepage, as in Unicode codepage... Whereby I define one page to be 256 code points.

But, it probably won't work, because once the codepage is changed, the ability to encode the escape character that changes the codepage is lost.

Unless a different page size is used, say, 128, and letting the lower half of a byte codespace be always ASCII. But this is equally stupid.

UTF-8 is a quite clever solution... But, it isn't as effective for non-majority ASCII text.

plate · Post by **plate** » Thu Dec 04, 2003 12:38 pm

Deathwing00 wrote:
Zeitgeist wrote:それは実際に大きい、私日本語を話すことを愛する
Now you can translate...

Oh, that's Pidgin Japanese at best. Reads "That's really great, I love to speak Japanese" if you iron out the errors...

mattjgalloway · Post by **mattjgalloway** » Sun Aug 15, 2004 8:21 am

I had the same problem - solved by just removing the .mozilla/firefox directory and making a new profile.

I was going to mark my thread about this as solved, but it got locked! Lol!

Anyway, this works, so try it out

Post by **pjp** » Mon Aug 16, 2004 4:02 pm

Moved from Forums Feedback.

mattjgalloway · Post by **mattjgalloway** » Mon Aug 16, 2004 4:21 pm

Nooo!

This one IS a query about the Forums.

Oh I give up.

2 different posts completely - just so happened that we were both talking about Japanese...

kerrick · Post by **kerrick** » Mon Aug 16, 2004 4:49 pm

I renenber reading on msdn a while back that to avoid problems with codepages, you should set both the page's encoding and the form's encoding, that way properly functioning browsers will both decode received data, and encode transmitted data consitantly.