Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
man and utf8: problem with accents
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
FrancoisVal
Tux's lil' helper
Tux's lil' helper


Joined: 12 May 2005
Posts: 82
Location: Namur, Belgique

PostPosted: Sun Feb 05, 2006 2:24 pm    Post subject: man and utf8: problem with accents Reply with quote

Hello everybody,
I have switched to utf8 on gentoo. Everything seems to work correctly except man pages. I have read the guide of the page http://www.gentoo.org/doc/en/utf-8.xml and modified /etc/man.conf as suggested but it doesn't worked. In console or in a raphic terminal, the result is the same. In fact, all accents and other typical french characters are not displayed correctly.

Does anybody know a solution to the problem ?
Thanks for your help,

François Valenduc
_________________
François Valenduc
Back to top
View user's profile Send private message
chrroessner
Apprentice
Apprentice


Joined: 02 Dec 2003
Posts: 156
Location: Germany

PostPosted: Sun Feb 05, 2006 3:50 pm    Post subject: Reply with quote

Check /etc/env.d/70less for the LESSCHARSET option (like in the doc described). env-update and check the /etc/man.conf NROFF thing. This should fix your problems, I hope.

Rössi
Back to top
View user's profile Send private message
FrancoisVal
Tux's lil' helper
Tux's lil' helper


Joined: 12 May 2005
Posts: 82
Location: Namur, Belgique

PostPosted: Sun Feb 05, 2006 4:58 pm    Post subject: Reply with quote

Thanks for your answer.

Unfortunately, that didn't solve the problem. I had already a line export LESSCHARSET=utf-8 in the ZSH config file (/etc/zsh/zshenv) and that doesn' work better. I had read in the Howto that ZSH didn't provide support for utf8 but it doesn't seem to be true anymore. I manage to display and manage utf8 characters correctly with ZSH (in console or in a grahical terminal). I have also tried with bash but I still have problems with manpages. Is it really possible to use man and utf8 correctly when you are native french speaker and want to read french with accents or other charaters (like é, è, à, ç...) ? All these characters are not reproduced correctly in man pages. For example, for "é", I get "é".
_________________
François Valenduc
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Sun Feb 05, 2006 6:28 pm    Post subject: Reply with quote

"é" is what you get when you output UTF-8 text to an ISO-8859-1 terminal. Are you sure your terminal is in UTF-8? What happens in bash when you issue
Code:
echo $'\xc3\xa9'
If your terminal is in UTF-8 you will get é, otherwise é.
_________________
No more cruft
dep: Revdeps that work
Using command-line ACCEPT_KEYWORDS?
Back to top
View user's profile Send private message
FrancoisVal
Tux's lil' helper
Tux's lil' helper


Joined: 12 May 2005
Posts: 82
Location: Namur, Belgique

PostPosted: Mon Feb 06, 2006 9:10 pm    Post subject: Reply with quote

I have tested and when I type echo $'\xc3\xa9', I get "é" in konsole, xterm or in a text console. If I use bash or zsh, the "é" is displayed correctly. When I use less to view a file, french characters are displayed correctly (in all the situations listed here above). So the problem is really due to man, groff or nroff. As explained in the howto, I have put the following line in the /etc/man.conf:
NROFF /usr/bin/nroff -mandoc -c

Is there a minimal version of man or groff required to use UTF8 encoding correctly ?
Thanks for your help
_________________
François Valenduc
Back to top
View user's profile Send private message
ecatmur
Advocate
Advocate


Joined: 20 Oct 2003
Posts: 3595
Location: Edinburgh

PostPosted: Mon Feb 06, 2006 11:24 pm    Post subject: Reply with quote

The comments in man.conf suggest that in some circumstances nroff will "double convert" to utf8 i.e. it interprets a utf8 stream as iso8859 (latin1) and applies the latin1->utf8 conversion. That would give the results you are experiencing.

Try "/usr/bin/nroff -c -mandoc -Tlatin1".
_________________
No more cruft
dep: Revdeps that work
Using command-line ACCEPT_KEYWORDS?
Back to top
View user's profile Send private message
neysx
Retired Dev
Retired Dev


Joined: 27 Jan 2003
Posts: 795

PostPosted: Tue Feb 07, 2006 11:37 am    Post subject: Reply with quote

With a fr_FR.utf8 locale and /usr/bin/nroff -c -mandoc -Tlatin1 in man.conf, it almost works but not quite as some characters are still borked. With /usr/bin/nroff -mandoc -Tlatin1 (remove -c), underlined é are displayed properly, but all à are show as <C3>.
Strangely enough, /usr/share/man/fr/man1/man.1.gz is utf-8 encoded. I am not familiar with man pages and *roff stuff, but it looks like there's some conversion to another encoding (latin?) and then another one back to utf8, and some chars are lost in the process.
If you recode /usr/share/man/fr/man1/man.1.gz to latin:
Code:
gunzip /usr/share/man/fr/man1/man.1.gz
recode u8..l9 /usr/share/man/fr/man1/man.1
and use /usr/bin/nroff -mandoc in man.conf, then man man is properly displayed.

Keeping the same setup, I tried man nano. Guess what, it works. Why? /usr/share/man/fr/man1/nano.1.gz is latin9-encoded.

The bottom-line is it appears not all man pages share the same encoding.

Hth
Back to top
View user's profile Send private message
FrancoisVal
Tux's lil' helper
Tux's lil' helper


Joined: 12 May 2005
Posts: 82
Location: Namur, Belgique

PostPosted: Tue Feb 07, 2006 12:23 pm    Post subject: Reply with quote

So it seems that different man pages have different character encoding. If a man page is encoded with latin9, using nroff -mandoc works even if the selected locale is fr_BE.utf-8 ? That looks me quite strange. Should't we open a bug on bugzilla to ask wheter all man pages should be encoded with a similar encoding ?

I will check on my computer when I am back from my work and see wheter I can make some progress.
Thanks for your help
_________________
François Valenduc
Back to top
View user's profile Send private message
FrancoisVal
Tux's lil' helper
Tux's lil' helper


Joined: 12 May 2005
Posts: 82
Location: Namur, Belgique

PostPosted: Wed Feb 08, 2006 9:50 am    Post subject: Reply with quote

Indeed, converting the man pages of man to latin9 and using nroff -mandoc works for man and noano. However, I tried the trick with other man pages (ls, mount for example) and I didn't manage to display these pages correctly. Furthermore, if I have to recode all man pages into a suitable encoding, it is going to be very tiresome. I would like to find an easier solution (and not going back to iso8859-1), if possible !

Thanks for your help
_________________
François Valenduc
Back to top
View user's profile Send private message
buzz22
n00b
n00b


Joined: 12 Apr 2006
Posts: 1

PostPosted: Wed Apr 12, 2006 2:55 pm    Post subject: Reply with quote

Take a look at http://www.haible.de/bruno/packages-groff-utf8.html. It's a groff extension that allows to view UTF-8 encoded man pages.
It works for me. Just replace
Code:
NROFF      /usr/bin/nroff -c -mandoc
by
Code:
NROFF      /usr/bin/groff-utf8 -Tutf8 -mandoc
in your /etc/man.conf.

Hope it can help you,
Laurent
Back to top
View user's profile Send private message
Dominique_71
Veteran
Veteran


Joined: 17 Aug 2005
Posts: 1877
Location: Switzerland (Romandie)

PostPosted: Mon Nov 18, 2013 8:54 am    Post subject: Reply with quote

buzz22 wrote:
Take a look at http://www.haible.de/bruno/packages-groff-utf8.html. It's a groff extension that allows to view UTF-8 encoded man pages.
It works for me. Just replace
Code:
NROFF      /usr/bin/nroff -c -mandoc
by
Code:
NROFF      /usr/bin/groff-utf8 -Tutf8 -mandoc
in your /etc/man.conf.

Hope it can help you,
Laurent


Thanks Laurent, it worked. After trying every thing else, I am now able to get the correct characters. It screw up my custom less colour scheme, but at least the characters are the good ones.
_________________
"Confirm You are a robot." - the singularity
Back to top
View user's profile Send private message
grumblebear
Apprentice
Apprentice


Joined: 26 Feb 2008
Posts: 202

PostPosted: Tue Nov 14, 2017 12:06 pm    Post subject: Reply with quote

Sorry for waking up an old thread. But after years of ignoring the garbage displayed in localized man pages on a fully UTF-8 configured Gentoo, I have now found the correct solution.

The line in /etc/man.conf should read
Code:
NROFF           /usr/bin/preconv | /usr/bin/nroff -mandoc


Maybe someone can update the wiki at https://wiki.gentoo.org/wiki/UTF-8.

Or perhaps a bug should be filed for sys-apps/man.

Edit:
This solution garbles man pages that are not utf8-encoded.

Conclusion: Better use sys-apps/man-db and get rid of sys-apps/man.
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Tue Nov 14, 2017 3:42 pm    Post subject: Reply with quote

Hi grumblebear, thanks for sharing this! This is indeed the right solution and it works.

I digged a little deeper into this. We live in 2017 and groff still doesn't support utf8 encoded input streams. What a shame! preconv does the right thing. It tries to detect the encoding of the input stream. If it detects utf8, it escapes all non-ASCII characters so that groff can't garble them.

There are other solutions. groff-utf8, for example. But I think the preconv-solution is currently the best we can do.
Back to top
View user's profile Send private message
charles17
Advocate
Advocate


Joined: 02 Mar 2008
Posts: 3664

PostPosted: Tue Nov 14, 2017 5:19 pm    Post subject: Reply with quote

grumblebear wrote:
Maybe someone can update the wiki at https://wiki.gentoo.org/wiki/UTF-8.

What's wrong with »This is only needed when sys-apps/man is used instead of sys-apps/man-db.«?
Back to top
View user's profile Send private message
grumblebear
Apprentice
Apprentice


Joined: 26 Feb 2008
Posts: 202

PostPosted: Tue Nov 14, 2017 8:48 pm    Post subject: Reply with quote

charles17 wrote:
grumblebear wrote:
Maybe someone can update the wiki at https://wiki.gentoo.org/wiki/UTF-8.

What's wrong with »This is only needed when sys-apps/man is used instead of sys-apps/man-db.«?


That sentence is absolutely right. Only the following code block should be updated. Now in 2017 nearly all man pages are utf8-encoded, so it is best to put the preconv step into the NROFF command.
Back to top
View user's profile Send private message
charles17
Advocate
Advocate


Joined: 02 Mar 2008
Posts: 3664

PostPosted: Wed Nov 15, 2017 7:47 am    Post subject: Reply with quote

grumblebear wrote:
That sentence is absolutely right. Only the following code block should be updated. ...

Feel free to do so. Everybody is allowed to contribute to Gentoo wiki.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum