View previous topic :: View next topic |
Author |
Message |
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Mon Jun 16, 2014 2:36 pm Post subject: nroff text formating man pages utf8/ascii |
|
|
hello ...
This is a problem I've been ignoring for some time as it seemed that no matter what I did the formating of some characters in manpages would be wrong.
If your locale/env is utf-8/unicode you can expect that somewhere along the line some char in the console/terminal is going to display incorrectly, so you set the unicode useflag, your locale, set the consolefont, edit this or that config file, etc, etc, but there is *always* something that for whatever reason doesn't work correctly with utf8. Case in point nroff and man pages ...
The UTF-8 section for man on the wiki seems to be dated, as the suggested change is what is actually provided by default in man.conf ... well, at least the removal of '-Tascii' ... this none the less produces the following if "NROFF /usr/bin/nroff -mandoc"
Code: | [...] builtin spelling correc[]
tion |
... the double square braces (above) are here reproducing a block, which should be displayed as a dash (designating a word break).
If, following the comment re "prevent[ing] double conversion to utf8" I use '-Tlatin1' the dash is then correctly displayed but bullet lists are displayed as "<B7>", eg:
Code: | <B7> changing directories with the cd builtin |
If I use '-Tascii' then bullet lists are displayed as "o" ... which is acceptable ... but then once and a while other odd character oddities creep in (which I'm unable to reproduce right now).
There is also a situation in which some chars in the console are not displayed correctly (which I guess means that the consolefont isn't properly utf8).
So, what is one supposed to do ... it just seems like its impossible to find something that works in all instances. I'm fairly sure that otherwise my setup is configured correctly (though as I don't use the console that often I've not spent my energy on figuring out why some other characters in the shell are displayed incorrectly ... which is related but somewhat of a seperate issue to man).
best ... khay
EDIT: removed [SOLVED]
Last edited by khayyam on Mon Jun 30, 2014 12:09 pm; edited 2 times in total |
|
Back to top |
|
|
steveL Watchman
Joined: 13 Sep 2006 Posts: 5153 Location: The Peanut Gallery
|
Posted: Mon Jun 16, 2014 2:50 pm Post subject: |
|
|
khayyam: I'd ask #bash on freenode. I can do that if you like, but it's going to be much better if you do, since you know what you're talking about and can respond live. It would definitely be on-topic in ##workingset but you'll likely find more people who actually use nroff in #bash, and they definitely know about working with UTF-8.
Pop into #friendly-coders if you do come online.
igli. |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Sat Jun 21, 2014 7:15 pm Post subject: |
|
|
thanks steve ... but I don't think this is solvable. UTF-8 support for the cli is basically impossible as there are too many factors in play (and all eyes are on some other sphere of 'integration')
Unicode is supposed to be fully backward compatible with ASCII but obviously in the case of nroff/groff/troff it can't handle the fact that the manpages are ascii (go figure). No matter what conversion is done some aspect of the display is incorrect (you just need to select which of those is *least* annoying).
khayyam wrote: | If I use '-Tascii' then bullet lists are displayed as "o" ... which is acceptable ... but then once and a while other odd character oddities creep in (which I'm unable to reproduce right now). |
I've since identified the issue here, with -Tascii umlauts (or other accents) are produced like the following:
Code: | Ulrich MA1/4ller <ulm@gentoo[dot]org> |
That basically means that '-mandoc -Tascii', '-mandoc -Tlatin1', or simply '-mandoc' (which should be utf8) all fail to produce the correct output in some part of the manpage.
I didn't mention it previously but I use rxvt-unicode as a terminal emulator, so there should be no issues in that regard.
best ... khay |
|
Back to top |
|
|
mv Watchman
Joined: 20 Apr 2005 Posts: 6747
|
Posted: Sun Jun 22, 2014 8:16 am Post subject: |
|
|
AFAIK, groff is basically unable to handle utf8: The last version is years old, and most parts of utf8 are on the todo list.
Flameeyes once had put heirloom-nroff into the tree which was able to deal with utf8 but which lacked too many of groff's extensions which are used meanwhile in many projects. Thus, for instance, I realized that it is impossible to give an "ä" which is printed correctly with heirloom-nroff and with groff - no matter which way of specifying the symbol you use, there is no way which works on both.... (and I even tried some sort of "programming" in *roff to solve the problem.) |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Sun Jun 22, 2014 11:37 am Post subject: |
|
|
mv wrote: | AFAIK, groff is basically unable to handle utf8: The last version is years old, and most parts of utf8 are on the todo list. |
mv ... thanks, I guess this makes the advice in the wiki simply incorrect. Do you also see the issue above with dashes?
EDIT: ok, so installing this UTF-8 groff wrapper and setting 'NROFF /usr/local/bin/groff-utf8 -Tutf8 -mandoc' in man.conf seems to have solved the above issues, dash as line break are rendered, bullet lists are correct, umlauts/accents are correct. I'll mark this as [SOLVED] ... though I'll keep my eye on things and report any further oddities.
best ... khay |
|
Back to top |
|
|
mv Watchman
Joined: 20 Apr 2005 Posts: 6747
|
Posted: Sun Jun 22, 2014 1:46 pm Post subject: |
|
|
I did not know about this wrapper. Why is it not in the gentoo tree? I suppose many people have this problem... |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Sun Jun 22, 2014 2:52 pm Post subject: |
|
|
mv wrote: | I did not know about this wrapper. Why is it not in the gentoo tree? I suppose many people have this problem... |
mv ... no idea, if you look at how old the above linked page is (Last modified: 5 July 2005) you'd think that the issue would have seen some love ... but as with anything that's not about usability, the desktop, etc, these things tend to languish in the seventh circle of integration (a similar example would be net-tools which have been without maintainership since 2002 ... and not being maintained helped create the disparate tools, and disunity, we see currently).
best ... khay |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Mon Jun 30, 2014 12:09 pm Post subject: |
|
|
well ... I spoke too soon, somehow (I assume due to the recent =dev-libs/icu-52.1 update) I'm back to the old behavior ... dashes for line breaks are again some un-displayable character. Everything on the system is up-to-date (stable mostly), revdep-rebuild, and @preserved-rebuild, show nothing needing rebuilt, rebuilding man, groff, and the groff-utf8 wrapper makes no difference.
I'm not sure how I could solve the issue and then it suddenly revert to the exact same behavior. I can't be sure its icu but that seems the obvious culprit ... but groff is not linked in any way to icu, so go figure.
EDIT: searching b.g.o I see this comment from spanky ... "man-1.6f-r1 drops all -c and -T options by default and forces latest groff so as to avoid the dash funkiness" ... that's from Feb 2008, but I wonder if this "dash funkiness" is the same as the above and if this was perhaps fixed but there is some other option required, configure switch, or patch that needs removed.
EDIT2: seems Bug 121502 plays some role in this ... removing groff-1.19.2-man-unicode-dashes.patch makes no difference however.
best ... khay |
|
Back to top |
|
|
steveL Watchman
Joined: 13 Sep 2006 Posts: 5153 Location: The Peanut Gallery
|
Posted: Tue Nov 25, 2014 12:14 pm Post subject: |
|
|
@khayyam What about this topic?
I'm trying to get a definitive answer, so your input would be very useful.
iab,
igli. |
|
Back to top |
|
|
khayyam Watchman
Joined: 07 Jun 2012 Posts: 6227 Location: Room 101
|
Posted: Wed Nov 26, 2014 11:56 am Post subject: |
|
|
steveL wrote: | @khayyam What about this topic? I'm trying to get a definitive answer, so your input would be very useful. |
hey steve ...
the '-k' switch to groff (recommended by r90 in the above linked post) doesn't resolve the issues ... I'd tried this some time back. There are still some chars which are displayed as blocks (specifically around email addresses, URL's, and line breaks on same). Umlauts (and other such chars), bullet lists, regular "dash" line breaks are displayed correctly so it does offer some improvement from the standard configuration (though the same can be achieved using the above linked groff-utf8).
In short, from my time spent poking at the problem I've concluded there is no method of getting man pages to display correctly in all respects ... its basically irresolvable.
best ... khay |
|
Back to top |
|
|
steveL Watchman
Joined: 13 Sep 2006 Posts: 5153 Location: The Peanut Gallery
|
Posted: Wed Nov 26, 2014 6:40 pm Post subject: |
|
|
OK, VinzC's solution seems very similar to yours:
/etc/man.conf:
Code: | TROFF /usr/local/bin/groff-utf8 -Tps -mandoc
NROFF /usr/local/bin/groff-utf8 -Tutf8 -mandoc
JNROFF /usr/local/bin/groff-utf8 -Tnippon -mandocj |
so I'll go with that for now then.
Not sure what the exact issue is, beyond not having all the glyphs available in the display font.
If you have any corrections to above, please let me know. |
|
Back to top |
|
|
|