Yikes. After lots and lots of painful experimentation and kernel source code reading, I've determined that, if I'm understanding everything correctly, Linux's keymap support is really, really, REALLY broken. Is that even possible?
It looks like Gentoo's DUMPKEYS_CHARSET is a mistake on someone's part. It is being used to basically tell loadkeys what encoding an earlier invocation of loadkeys autodetected. This doesn't make sense to me as-is, but it makes even less sense when you consider that loadkeys autodetects these things on a key-by-key basis, so there's pretty much no right answer for this setting unless you're using iso-8859-1. That certainly would explain the poor documentation for what to put there! The fix (again, assuming I'm understanding everything right) should be to remove the lines:
Code: Select all
local dumpkey_opts=""
[[ -n ${DUMPKEYS_CHARSET} ]] && dumpkey_opts="-c ${DUMPKEYS_CHARSET}"
dumpkeys ${dumpkey_opts} | loadkeys --unicode
from /etc/init.d/keymaps, and instead add an if-statement to add "--unicode" on the /bin/loadkeys command line a few lines above that whenever UNICODE="yes". Is it possible to tell a developer about this, assuming I'm right (which I'd rather not assume without some confirmation)?
But there are more problems. Unless I misunderstand, "compose" definitions (which are the coolest thing ever, when they work) are only allowed for single-byte encodings, and must always be specifically iso-8859-1 when Unicode is in use. Great. So as long as we're all Western Europeans there's no problem. I consider this a bug in the keymap file format, and therefore a bug in loadkeys and dumpkeys. Not Gentoo's fault, I guess.
But wait, there's more. I'm less sure about this one, but it really does appear that the kernel cannot handle Unicode characters above 0xEFF. Since that leaves out all the astral planes, as well as a few categories of special characters in plane 0, it doesn't seem like full Unicode support to me.
Actually, though, I still can't understand why the decision was made to try and support Unicode as this whole other category of life instead of just another charset (as mentioned in my original post). I don't think the keyboard driver should know what UTF-8 is at all. loadkeys and dumpkeys should know about it, and just map the keysyms to the appropriate "macro" of several successive action codes to make the kernel output the correct multibyte sequence when a key is pressed. This, combined with a fix to make compose definitions work based on keysyms instead of literal single-byte values, would completely fix everything and end the scourge of console-mode keyboard localization problems for all time. More importantly, I would finally be able to set up the compose definitions I've been aching for
I would be deliriously happy to fix all of these myself, except that I doubt I would know how or where to post it to get anyone else to use the fixed versions, and I get the distinct feeling nobody really cares much about any of this console Unicode junk anyway, so I'd probably be the only user of it, which would make me feel like a pathetic loser, so barring a sudden surge of interest in this monologue thread, I'm probably not going to do it.
Heh, well in case anyone was interested, there's the final conclusion. I'll leave you alone now.
