Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
nroff text formating man pages utf8/ascii
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Mon Jun 16, 2014 2:36 pm    Post subject: nroff text formating man pages utf8/ascii Reply with quote

hello ...

This is a problem I've been ignoring for some time as it seemed that no matter what I did the formating of some characters in manpages would be wrong.

If your locale/env is utf-8/unicode you can expect that somewhere along the line some char in the console/terminal is going to display incorrectly, so you set the unicode useflag, your locale, set the consolefont, edit this or that config file, etc, etc, but there is *always* something that for whatever reason doesn't work correctly with utf8. Case in point nroff and man pages ...

The UTF-8 section for man on the wiki seems to be dated, as the suggested change is what is actually provided by default in man.conf ... well, at least the removal of '-Tascii' ... this none the less produces the following if "NROFF /usr/bin/nroff -mandoc"

Code:
[...] builtin spelling  correc[]
tion

... the double square braces (above) are here reproducing a block, which should be displayed as a dash (designating a word break).

If, following the comment re "prevent[ing] double conversion to utf8" I use '-Tlatin1' the dash is then correctly displayed but bullet lists are displayed as "<B7>", eg:

Code:
<B7>   changing directories with the cd builtin

If I use '-Tascii' then bullet lists are displayed as "o" ... which is acceptable ... but then once and a while other odd character oddities creep in (which I'm unable to reproduce right now).

There is also a situation in which some chars in the console are not displayed correctly (which I guess means that the consolefont isn't properly utf8).

So, what is one supposed to do ... it just seems like its impossible to find something that works in all instances. I'm fairly sure that otherwise my setup is configured correctly (though as I don't use the console that often I've not spent my energy on figuring out why some other characters in the shell are displayed incorrectly ... which is related but somewhat of a seperate issue to man).

best ... khay

EDIT: removed [SOLVED]


Last edited by khayyam on Mon Jun 30, 2014 12:09 pm; edited 2 times in total
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Mon Jun 16, 2014 2:50 pm    Post subject: Reply with quote

khayyam: I'd ask #bash on freenode. I can do that if you like, but it's going to be much better if you do, since you know what you're talking about and can respond live. It would definitely be on-topic in ##workingset but you'll likely find more people who actually use nroff in #bash, and they definitely know about working with UTF-8.

Pop into #friendly-coders if you do come online.

igli.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Sat Jun 21, 2014 7:15 pm    Post subject: Reply with quote

thanks steve ... but I don't think this is solvable. UTF-8 support for the cli is basically impossible as there are too many factors in play (and all eyes are on some other sphere of 'integration')

Unicode is supposed to be fully backward compatible with ASCII but obviously in the case of nroff/groff/troff it can't handle the fact that the manpages are ascii (go figure). No matter what conversion is done some aspect of the display is incorrect (you just need to select which of those is *least* annoying).

khayyam wrote:
If I use '-Tascii' then bullet lists are displayed as "o" ... which is acceptable ... but then once and a while other odd character oddities creep in (which I'm unable to reproduce right now).

I've since identified the issue here, with -Tascii umlauts (or other accents) are produced like the following:

Code:
Ulrich MA1/4ller <ulm@gentoo[dot]org>

That basically means that '-mandoc -Tascii', '-mandoc -Tlatin1', or simply '-mandoc' (which should be utf8) all fail to produce the correct output in some part of the manpage.

I didn't mention it previously but I use rxvt-unicode as a terminal emulator, so there should be no issues in that regard.

best ... khay
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Sun Jun 22, 2014 8:16 am    Post subject: Reply with quote

AFAIK, groff is basically unable to handle utf8: The last version is years old, and most parts of utf8 are on the todo list.
Flameeyes once had put heirloom-nroff into the tree which was able to deal with utf8 but which lacked too many of groff's extensions which are used meanwhile in many projects. Thus, for instance, I realized that it is impossible to give an "ä" which is printed correctly with heirloom-nroff and with groff - no matter which way of specifying the symbol you use, there is no way which works on both.... (and I even tried some sort of "programming" in *roff to solve the problem.)
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Sun Jun 22, 2014 11:37 am    Post subject: Reply with quote

mv wrote:
AFAIK, groff is basically unable to handle utf8: The last version is years old, and most parts of utf8 are on the todo list.

mv ... thanks, I guess this makes the advice in the wiki simply incorrect. Do you also see the issue above with dashes?

EDIT: ok, so installing this UTF-8 groff wrapper and setting 'NROFF /usr/local/bin/groff-utf8 -Tutf8 -mandoc' in man.conf seems to have solved the above issues, dash as line break are rendered, bullet lists are correct, umlauts/accents are correct. I'll mark this as [SOLVED] ... though I'll keep my eye on things and report any further oddities.

best ... khay
Back to top
View user's profile Send private message
mv
Watchman
Watchman


Joined: 20 Apr 2005
Posts: 6747

PostPosted: Sun Jun 22, 2014 1:46 pm    Post subject: Reply with quote

I did not know about this wrapper. Why is it not in the gentoo tree? I suppose many people have this problem...
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Sun Jun 22, 2014 2:52 pm    Post subject: Reply with quote

mv wrote:
I did not know about this wrapper. Why is it not in the gentoo tree? I suppose many people have this problem...

mv ... no idea, if you look at how old the above linked page is (Last modified: 5 July 2005) you'd think that the issue would have seen some love ... but as with anything that's not about usability, the desktop, etc, these things tend to languish in the seventh circle of integration (a similar example would be net-tools which have been without maintainership since 2002 ... and not being maintained helped create the disparate tools, and disunity, we see currently).

best ... khay
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Mon Jun 30, 2014 12:09 pm    Post subject: Reply with quote

well ... I spoke too soon, somehow (I assume due to the recent =dev-libs/icu-52.1 update) I'm back to the old behavior ... dashes for line breaks are again some un-displayable character. Everything on the system is up-to-date (stable mostly), revdep-rebuild, and @preserved-rebuild, show nothing needing rebuilt, rebuilding man, groff, and the groff-utf8 wrapper makes no difference.

I'm not sure how I could solve the issue and then it suddenly revert to the exact same behavior. I can't be sure its icu but that seems the obvious culprit ... but groff is not linked in any way to icu, so go figure.

EDIT: searching b.g.o I see this comment from spanky ... "man-1.6f-r1 drops all -c and -T options by default and forces latest groff so as to avoid the dash funkiness" ... that's from Feb 2008, but I wonder if this "dash funkiness" is the same as the above and if this was perhaps fixed but there is some other option required, configure switch, or patch that needs removed.

EDIT2: seems Bug 121502 plays some role in this ... removing groff-1.19.2-man-unicode-dashes.patch makes no difference however.

best ... khay
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Tue Nov 25, 2014 12:14 pm    Post subject: Reply with quote

@khayyam What about this topic?

I'm trying to get a definitive answer, so your input would be very useful.

iab,
igli.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Wed Nov 26, 2014 11:56 am    Post subject: Reply with quote

steveL wrote:
@khayyam What about this topic? I'm trying to get a definitive answer, so your input would be very useful.

hey steve ...

the '-k' switch to groff (recommended by r90 in the above linked post) doesn't resolve the issues ... I'd tried this some time back. There are still some chars which are displayed as blocks (specifically around email addresses, URL's, and line breaks on same). Umlauts (and other such chars), bullet lists, regular "dash" line breaks are displayed correctly so it does offer some improvement from the standard configuration (though the same can be achieved using the above linked groff-utf8).

In short, from my time spent poking at the problem I've concluded there is no method of getting man pages to display correctly in all respects ... its basically irresolvable.

best ... khay
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Wed Nov 26, 2014 6:40 pm    Post subject: Reply with quote

OK, VinzC's solution seems very similar to yours:
/etc/man.conf:
Code:
TROFF           /usr/local/bin/groff-utf8 -Tps -mandoc
NROFF           /usr/local/bin/groff-utf8 -Tutf8 -mandoc
JNROFF          /usr/local/bin/groff-utf8 -Tnippon -mandocj

so I'll go with that for now then.

Not sure what the exact issue is, beyond not having all the glyphs available in the display font.

If you have any corrections to above, please let me know.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum