Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
UTF-8 support: how??
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
barmaley
n00b
n00b


Joined: 26 Nov 2002
Posts: 36

PostPosted: Tue Feb 11, 2003 1:50 am    Post subject: UTF-8 support: how?? Reply with quote

It seems to me that UTF-8 support is broken on gentoo (or requires some steps during installation?). For example, if I do

Code:
export LC_CTYPE=en_US.UTF-8


and run some application, I get the following message:

Code:
Warning: locale not supported by C library, locale unchanged


Is it possible to enable the UTF-8 locale?
Back to top
View user's profile Send private message
zhenlin
Veteran
Veteran


Joined: 09 Nov 2002
Posts: 1361

PostPosted: Tue Feb 11, 2003 10:48 am    Post subject: Reply with quote

No, it just means it is time to lobby for 8-bit byte ints and 16-bit unsigned chars. Right now, we only have 8-bit chars.

UTF8 requires libc support for various string functions... You don't want wierd things happening to your characters.

Play with Mac OS X or Windows NT if you want your UTF-8/Unicode.
Back to top
View user's profile Send private message
plate
Bodhisattva
Bodhisattva


Joined: 25 Jul 2002
Posts: 1663
Location: Berlin

PostPosted: Tue Feb 11, 2003 11:31 am    Post subject: Reply with quote

Try recompiling glibc with USE="nls" and report back if it works, please.
Back to top
View user's profile Send private message
barmaley
n00b
n00b


Joined: 26 Nov 2002
Posts: 36

PostPosted: Tue Feb 11, 2003 10:27 pm    Post subject: Reply with quote

I searched the net and found the solution. To enable UTF-8 support the following command must be executed as root:

Code:
localedef -f UTF-8 -i en_US en_US.UTF-8


Works the same way for other languages, just replace en_US by something else. glib 2.2 or newer is required.

After that one can set LC_CTYPE as shown im my first posting, and all locale-aware programs that do not use char* tricks should work with UTF-8 (unfortunately not all programs qualify, so far I know about one program which doesn't: fte; standard unix programs like less, cat, vi, grep etc. seem to be ok).

Apparently it's not only a gentoo problem. Some people using other distributions reported the same problem, while in others UTF-8 seems to be working out of the box. Is there a special reason why UTF-8 isn't enabled on gentoo by default? I think, it should be at least mentioned in the installation instructions how to enable it.
Back to top
View user's profile Send private message
plate
Bodhisattva
Bodhisattva


Joined: 25 Jul 2002
Posts: 1663
Location: Berlin

PostPosted: Wed Feb 12, 2003 12:56 am    Post subject: Reply with quote

Could you file a bug report about this, please? There's a section at bugs.gentoo.org that deals with the documentation, and going on record there is the fastest way to catch the devs' attention and get the installation manual or other docs modified.
Back to top
View user's profile Send private message
roman
n00b
n00b


Joined: 20 May 2002
Posts: 17

PostPosted: Wed Feb 12, 2003 8:32 am    Post subject: UTF-8 coding for locales Reply with quote

"standard" locales are created by time of building glibc library.
glibc is building this locales prior to localedata/SUPPORTED file (
and some additions defined by Makefile).

So if some locale ( like cs_CZ.UTF-8 ) is not mentioned there,
the locale definition files are not added to catalogue.

User CAN make the required locale definition by using glibc
localedef utility, but glibc team cannot guarantee the safe
using of this locale.

So, 'cause gentoo uses "standard" compilled packages, some
locale-coding is not there...

Gentoo can patch the SUPPORTED file, but it is the way
to hell, 'cause gentoo devs cannot guarantee that this
injected locales will work for each USE x program combination.

So better is to mention in some (localisation) guide how
to "inject" the user locale-coding definition.

As mentioned, simple adding the UTF-8 coding for known
locale is easy:

localedef -i <locale> -f <codepage> <output_file_or_default>
fe.: localedef -i cs_CZ -f UTF-8 cs_CZ

See the localedata/README file in glibc source directory

localedef has no man page but localedef --help can be usefull.
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Wed Jun 18, 2003 9:44 pm    Post subject: UTF-8 Reply with quote

I managed to compile and set an cs_CZ.UTF-8 locale, however, I still do not see the correct characters - I see UTF-8 translated to ISO-8859-1(2?) double-characters. Well, sometimes - it is OK in the "mc" hint bar, but not in the menu and in other messages. I tried some different fonts, consoletrans-es, but without success.

Any ideas?

Thanks,
Wanthalf
Back to top
View user's profile Send private message
barmaley
n00b
n00b


Joined: 26 Nov 2002
Posts: 36

PostPosted: Wed Jun 18, 2003 11:15 pm    Post subject: Reply with quote

AFAIK, either mc itself or some of the libraries it relies on is not fully UTF-compatible. At least this was so with the last Version I tried. At least the internal mc viewer did not work at all (it showed garbage).

Do you see correct symbols with standard Linux programs like "cat"? If so, it's definitely an mc problem, so no fonts will help you.
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Thu Jun 19, 2003 2:07 pm    Post subject: UTF-8 Reply with quote

No, the other way: the hint bar in MC is the ONLY ONE place, where it DOES work. It doesn't work anywhere else.
Back to top
View user's profile Send private message
barmaley
n00b
n00b


Joined: 26 Nov 2002
Posts: 36

PostPosted: Thu Jun 19, 2003 8:46 pm    Post subject: Reply with quote

If so, maybe it's really a font problem? Do you have this problem in the text console or in a terminal emulator (which one?), or both?
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Thu Jun 19, 2003 9:00 pm    Post subject: Reply with quote

Everywhere. Changing fonts doesn't help. Could the combination of the cs_CZ and the UTF-8 make the system interpret the characters as the old iso-8859-2 coding? Is there really no information how to handle the encoding in the first (cs_CZ) part?
Back to top
View user's profile Send private message
barmaley
n00b
n00b


Joined: 26 Nov 2002
Posts: 36

PostPosted: Thu Jun 19, 2003 11:49 pm    Post subject: Reply with quote

I just tied it the following way:

Code:

localedef -i cs_CZ -f UTF-8 cs_CZ.UTF-8
export LC_ALL=cs_CZ.UTF-8
gnome-terminal --disable-factory


If a view a Unicode text file, for example using the standart cat program, everything is right. However if I run programs in the Czech locale, in some programs the Czech messages are broken (for example, manual pages). So, my guess is that the UTF-8 support is ok, but text messages in some programs are encoded in a wrong way. Unfortunately, most programs do not have Czech messages at all, so I could not check much. I tried the following:

mc - ok (except tips)
man man - manual page is wrong, program itself has no Czech messages
pinfo - own messages ok, contents wrong
mutt - ok

As to manual pages: it may be a general encoding problem, maybe a bug in groff or somewhere else. As to mc: I woudn't pay much attention to mc now, because its very buggy.

There is a screenshot:

http://www.hashbang.de/pics/cz-utf.png

Do you see the same when you type 'mutt --help'? If no, did you set up the locale exactly as shown above?[/code]
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Fri Jun 20, 2003 10:41 am    Post subject: Reply with quote

Well, here is my screenshot:

http://web.ff.cuni.cz/~vondric/screen.png

You can see it breaks the whole Konsole.

The man pages are probably still written in ISO-8859-2. I remember this problem from RedHat, where UTF-8 works well anyway.

++:
Oh, sorry. Now I see it is OK in gnome-terminal! But not in Konsole, neither on the plain console! Strange.

Another strange problem:
When I run e.g. gimp (or another gtk1 app) as a usual user, there is no text - no text in splash, on buttons, in menu (there are only keybindings in the menus!). When I do the same as root, it is (almos?!?) OK.

Maybe, there are some other problems? Could it have something in common with this problem:

This is what I get when I run any kde-app as a user:

libGL error: failed to open DRM: Operation not permitted
libGL error: reverting to (slow) indirect rlibGL error: InitDriver failed

And this is what I get when I run it as root:

libGL error: InitDriver failed
libGL error: InitDriver failed
kbuildsycoca running...
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Fri Jun 20, 2003 11:27 am    Post subject: Reply with quote

Once more. I tried restarting the system and now it is a bit better:

- It works even in Konsole.
- It works in GIMP (an other gtk1-apps) ONLY when running as root, but not as a proper user!
- The plain console (text-mode/fb) behaves exactly like Konsole did behave on the screenshot in my last post.
[- Well, you're right MC is not a good example, but let me just mention it: in Konsole/x-term there are correct letters but only those that are common to ISO-8859-1 and 8859-2. The others show incorrectly. In the plain text console everything is wrong.]

I guess it has something to do with the console fonts (and/or consoletrans? what does this mean at all?). But which should I use?

And what about the strange GIMP behaviour?
Back to top
View user's profile Send private message
barmaley
n00b
n00b


Joined: 26 Nov 2002
Posts: 36

PostPosted: Fri Jun 20, 2003 12:10 pm    Post subject: Reply with quote

Quote:

Once more. I tried restarting the system and now it is a bit better:
- It works even in Konsole.


This is probably because Konsole shares some data among all its processes. gnome-terminal does it the same way, so I have to use the '--disable-factory' option, as you see above. Maybe there is a similar option for Konsole.

Quote:

- It works in GIMP (an other gtk1-apps) ONLY when running as root, but not as a proper user!


Maybe for your proper user GTK uses a non-unicode font? Check ~/.gtkrc for both root and user, you can set the font there.

Quote:

In the plain text console everything is wrong


Did you set the locale before spawning the tty? I'm not quite sure, but I think, just exporting LC_ALL in the current session won't help, you have to set it somewhere in /etc/init.d/local and respawn the tty (or reboot). But since I almost never use the text console, I never tried to set up UTF-8 there, so it's just a guess.[/u][/quote]
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Fri Jun 20, 2003 12:18 pm    Post subject: Reply with quote

Quote:

processes. gnome-terminal does it the same way, so I have to use the '--disable-factory' option, as you see above. Maybe there is a similar option for Konsole.


It works now both in gnome-terminal and Konsole, as I said. I didn't need to use any options.

Quote:

Did you set the locale before spawning the tty? I'm not quite sure, but I think, just exporting LC_ALL in the current session won't help, you have to set it somewhere in /etc/init.d/local and respawn the tty (or reboot). But since I almost never use the text console, I never tried to set up UTF-8 there, so it's just a guess.[/u]


I rebooted the computer with LC_ALL setting in /etc/env.d/00basic and after an env-update. Can I do more?

Thanks,
W.
Back to top
View user's profile Send private message
barmaley
n00b
n00b


Joined: 26 Nov 2002
Posts: 36

PostPosted: Fri Jun 20, 2003 12:50 pm    Post subject: Reply with quote

Quote:
It works now both in gnome-terminal and Konsole, as I said. I didn't need to use any options.


I mean, with this option there is no need to reboot. And we, Linux users, usually hate to reboot our computers, don't we? ;-)


[quote]I rebooted the computer with LC_ALL setting in /etc/env.d/00basic and after an env-update. Can I do more?[quote]

Sorry, I never tried it myself, so I cannot give you a 100% accurate advice here, but there are some guesses:

- I'm not sure if /etc/env.d/00basic is executed before the ttys are spawned. You could try to set LC_ALL or LANG in /etc/rc.d/local to be sure, if it doesn't work.

- You could try to play with setfont, maybe there is a font problem. For example:

Code:
setfont /usr/share/consolefonts/some_font  -m /usr/share/consoletrans/some_translation


However I'm not sure, which settings are the right ones. There is also an '-u' option in setfont.

HTH
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Fri Jun 20, 2003 2:17 pm    Post subject: Reply with quote

I managed to solve some problems:

- For GTK1 applications: there were no .gtkrc files neither in /root nor in the user home dir. (Or at least no useful gtkrc with any difference.) I used a copy of an /etc/gtk/gtkrc.utf8 from a RedHat9 installation as my current .gtkrc in my Gentoo and now GIMP (et al.) works correctly even for me as a user. It worked before only when I logged into a gnome session, now it works under KDE session as well. The gtkrc file has the following contents:

Quote:

style "default-text" {
fontset = "-*-helvetica-medium-r-normal--*-120-*-*-p-*-*-*"
}

class "GtkWidget" style "default-text"


- For text console: I found a magic ;-). Just to run a command called "unicode_start" enebles an unicode mode for the console and everything works there as well. The only problem is again MC that gets even more crazy than in the xterm window, but I guess this can be solved as well, at least to the state in the xterm (at least some characters working). I will have a look at this as well.
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Fri Jun 20, 2003 2:49 pm    Post subject: Reply with quote

A better MC solution comes later... so I deleted rather this message.

Last edited by wanthalf on Fri Jun 20, 2003 4:24 pm; edited 2 times in total
Back to top
View user's profile Send private message
barmaley
n00b
n00b


Joined: 26 Nov 2002
Posts: 36

PostPosted: Fri Jun 20, 2003 2:50 pm    Post subject: Reply with quote

Quote:
For text console: I found a magic icon_wink.gif. Just to run a command called "unicode_start" enebles an unicode mode for the console and everything works there as well.


That sounds interesting. Where did you get this program? Please do

Code:
qpkg -f `which unicode_start`
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Fri Jun 20, 2003 3:38 pm    Post subject: Reply with quote

The answer from qpkg:

sys-apps/kbd *

(BTW, unicode_start is just a shell script loading correct fonts, keymap and the console switching by itself is done by an echo -n -e '\033%G')

I found out, that RedHat uses slang libraries (MC uses them) patched for UTF-8 support. I used their src.rpm and installed their version (I know, a bit dirty to overwrite the Gentoo version :-( ). Now, there are both libslang and libslang-utf8 libraries in /usr/lib. I have read, they (the programs compiled with them) are not binary compatible. I suppose that it will be necessary to recompile MC against the utf8 version. Any ideas how to do it, in the best case with the help of portage???
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Fri Jun 20, 2003 4:04 pm    Post subject: MC and UTF-8 Reply with quote

Well, actually, MC in RedHat is patched as well. So I installed rpm-tools and downloaded the slang and mc src.rpm packages from RedHat9, made a rpmbuild --rebuild and installed the rpm packages (first installed slang, then rebuilt mc). Of course --nodeps was a neccessary option in almost all cases. I know, this is neither Gentoo-friendly nor RedHat-friendly, but I cannot write ebuilds and this is easy and WORKS!
Back to top
View user's profile Send private message
zhenlin
Veteran
Veteran


Joined: 09 Nov 2002
Posts: 1361

PostPosted: Sat Jun 21, 2003 2:29 am    Post subject: Reply with quote

Writing ebuilds is easier than writing RPMs.

But I suppose that people coming from RedHat/Mandrake find this easier.

Unpack the RPM data somehow, extract the source tarballs, put it into distfiles; read the specs, modify the existing ebuild accordingly.

I would do this, but I have no access to those RPMs. Would someone point me to an URL where I can get those?
Back to top
View user's profile Send private message
wanthalf
n00b
n00b


Joined: 18 Jun 2003
Posts: 25

PostPosted: Sat Jun 21, 2003 8:28 am    Post subject: Reply with quote

I cannot write RPM. I tried to modify openoffice-bin-1.1_beta1.ebuild to install the binary OpenOffice.org 1.1beta2-czech-edition and it failed anyway in the end because it couldn't write some stupid temp file... Anyway, there were some problems with fonts when I installed it manually (even with the "unstable" official OOo-bin-1.1_beta1-english) (I couldn't see the most important fonts in OOo) and antialiasing, so I switched back to the english OOo-bin-1.0.2 - BTW: I do not understand, why the OOo-1.0.2 source ebuild doesn't support Czech - I modified it and it seemed to find the correct language patch, compile (12 hours) and install, but I didn't find any installed binaries at all, just symlinks to them from the ebuild srcipt... strange.

You can get RedHat's Source RPMs anywhere where you can get RedHat. I.e. for example in http://ftp.redhat.com/pub/redhat/linux/9/en/os/i386/SRPMS/ or just any mirror. It's easy to convert them to tar.gz with the rpm2targz command, that I found in Gentoo. In the tar.gz file you find all neccessary files: source tar.bz2 and all patches together with the .spec file. Well, you know this, of course, sorry.
Back to top
View user's profile Send private message
zhenlin
Veteran
Veteran


Joined: 09 Nov 2002
Posts: 1361

PostPosted: Sat Jun 21, 2003 9:54 am    Post subject: Reply with quote

This is going to be a problem...

Looks like I will have to fork the slang ebuild, since an ABI compatible version is not built.

I'm going to test whether the utf8 patches work.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum