View previous topic :: View next topic |
Author |
Message |
netjiro n00b
Joined: 30 Dec 2004 Posts: 47 Location: Liechtenstein
|
Posted: Sun Jan 07, 2018 10:24 am Post subject: [SOLVED-ish] partial UTF8 problems in some kde ... |
|
|
I must have missed something, thankful for ideas on how to troubleshoot and fix:
I thought I had done the installation according to tfm (https://wiki.gentoo.org/wiki/UTF-8)
Most of the system works just perfectly fine with utf8 support.
BUT:
dolphin: klick/drag'n'drop of non plain ascii filename will not work, displaying an error message with the ascii interpreted filename. However, dolphin lists the utf8 filename correctly.
firefox: saving a file with a non-ascii character creates two files: one with the correct utf8 filename encoding but with 0 byte length, and one file with an extended ascii (CP1252 ?) encoding actually containing the data.
Code: | # cat /etc/env.d/02locale
LANG="en_US.utf8"
# eselect locale list
[4] en_US.utf8 *
# locale -a | grep utf
en_US.utf8
|
Any idea what I'm missing?
Last edited by netjiro on Sun Jan 07, 2018 1:28 pm; edited 1 time in total |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Sun Jan 07, 2018 10:47 am Post subject: |
|
|
What is the output of
Code: | export | egrep "(LANG|LC_)" |
|
|
Back to top |
|
|
netjiro n00b
Joined: 30 Dec 2004 Posts: 47 Location: Liechtenstein
|
Posted: Sun Jan 07, 2018 10:51 am Post subject: |
|
|
Quick further investigation:
Testing creating filenames with "€" sign in them:
octal codes: (LC_ALL=C ls -b)
kterm-euro:\342\202\254
dolphin-euro:\342\202\254
mozilla-euro:\302\254 -- and just single byte -- \254
So: both kterm and dolphin will create the same 0xE282AC for "€"
while mozilla creates one file with: 0xC2AC and one with 0xAC for the "€" character.
kterm and dolphin can however still display the mozilla version 0xC2AC as "€" even if they themselves create it as 0xE282AC
And none of them create the 0x20AC I was naively expecting since the unicode 0x20AC is listed as "€". (http://www.unicodemap.org/details/0x20AC/index.html) Or perhaps that's for UTF16 only.
This is weird.
Last edited by netjiro on Sun Jan 07, 2018 11:02 am; edited 1 time in total |
|
Back to top |
|
|
netjiro n00b
Joined: 30 Dec 2004 Posts: 47 Location: Liechtenstein
|
Posted: Sun Jan 07, 2018 10:54 am Post subject: |
|
|
@mike155
Code: | # export | egrep "(LANG|LC_)"
declare -x LANG="en_US.utf8"
|
|
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Sun Jan 07, 2018 11:02 am Post subject: |
|
|
Quote: | I was naively expecting since the unicode 0x20AC is listed as "€" |
No. U+20AC is the Unicode code point. E2 82 AC is the UTF-8 hexadecimal encoding. There is a big difference between code points and UTF-8, UTF-16 and UTF-32 encodings. |
|
Back to top |
|
|
netjiro n00b
Joined: 30 Dec 2004 Posts: 47 Location: Liechtenstein
|
Posted: Sun Jan 07, 2018 11:05 am Post subject: |
|
|
mike155 wrote: | Quote: | I was naively expecting since the unicode 0x20AC is listed as "€" |
No. U+20AC is the Unicode code point. E2 82 AC is the UTF-8 hexadecimal encoding. There is a big difference between code points and UTF-8, UTF-16 and UTF-32 encodings. |
Yep, should have known that. (hang my head in shame)
? But what is mozilla doing and why can't dolphin klick/dragndrop with the non-ascii named files?
I'm missing something somewhere... |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Sun Jan 07, 2018 11:19 am Post subject: |
|
|
Code: | declare -x LANG="en_US.utf8" |
It's probably because LC environment variables are missing. Try to set LC_ALL to "en_US.utf8" before starting your desktop environment. If you need more control, you can set individual LC environment variables (instead of LC_ALL). Output of the command above on my computer is:
Code: | declare -x LANG="de_DE.utf8"
declare -x LANGUAGE="de_DE.utf8"
declare -x LC_ADDRESS="de_DE.utf8"
declare -x LC_COLLATE="POSIX"
declare -x LC_CTYPE="de_DE.utf8"
declare -x LC_IDENTIFICATION="de_DE.utf8"
declare -x LC_MEASUREMENT="de_DE.UTF-8"
declare -x LC_MESSAGES="en_US.utf8"
declare -x LC_MONETARY="de_DE.UTF-8"
declare -x LC_NAME="de_DE.utf8"
declare -x LC_NUMERIC="de_DE.UTF-8"
declare -x LC_PAPER="de_DE.UTF-8"
declare -x LC_TELEPHONE="de_DE.utf8"
declare -x LC_TIME="de_DE.UTF-8" |
|
|
Back to top |
|
|
netjiro n00b
Joined: 30 Dec 2004 Posts: 47 Location: Liechtenstein
|
Posted: Sun Jan 07, 2018 11:24 am Post subject: |
|
|
Great :) Thanks !
Where is the best place/way to set them?
Where do you do it? |
|
Back to top |
|
|
netjiro n00b
Joined: 30 Dec 2004 Posts: 47 Location: Liechtenstein
|
Posted: Sun Jan 07, 2018 11:34 am Post subject: |
|
|
Hmm, the "locale" command lists the variables, but "export" does not:
Code: | # locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=
|
Code: | # export | egrep "(LANG|LC_)"
declare -x LANG="en_US.utf8"
|
|
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Sun Jan 07, 2018 11:55 am Post subject: |
|
|
Quote: | It's probably because LC environment variables are missing. |
I re-checked documentation and I think I was wrong. It should be sufficient nowadays to set LANG. |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Sun Jan 07, 2018 12:39 pm Post subject: |
|
|
Quote: | firefox: saving a file with a non-ascii character creates two files: one with the correct utf8 filename encoding but with 0 byte length, and one file with an extended ascii (CP1252 ?) encoding actually containing the data. |
I can reproduce this if I unset all LANG, LANGUAGE and LC environment variables, start firefox, select File -> 'Save Page As' and enter a filename with non-ascii characters.
If I set the LANG environment variable, Firefox works as expected |
|
Back to top |
|
|
netjiro n00b
Joined: 30 Dec 2004 Posts: 47 Location: Liechtenstein
|
Posted: Sun Jan 07, 2018 1:12 pm Post subject: |
|
|
Great!
I just tested here:
If I start firefox from the kde launcher or other kde environment it will misbehave.
If I start firefox from a kterm console it will behave well and use the correct encoding.
So there is something wrong in my KDE environment, outside kterm console login.
In kterm console login:
Code: | # export | grep LANG
declare -x LANG="en_US.utf8"
declare -x LANGUAGE=""
|
And it behaves the same if I set LANGUAGE:
Code: | # export LANGUAGE="en_US.utf8"
# export | grep LANG
declare -x LANG="en_US.utf8"
declare -x LANGUAGE="en_US.utf8"
|
My .bashrc and .bash_profile just source /etc/profile, /etc/profile.d/bash-completion, add a local path, and set umask and prompt.
Checked that no LANG* or LC_* is set in /etc/profile
But I see that LANG is exported as export LANG='en_US.utf8' in the autogenerated /etc/profile.env
So, then the question becomes, where and how is my KDE environment set and how can I check what is declared in that environment? |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Sun Jan 07, 2018 1:17 pm Post subject: |
|
|
You could try the following:
1) rename /usr/bin/firefox
2) Create a shell script "usr/bin/firefox" with the contents below
Code: | #!/bin/bash
export > /tmp/firefox-export.txt |
3) Start Firefox from KDE launcher
4) Look at the file /tmp/firefox-export.txt to see which environment variables are passed to Firefox if started from KDE launcher
5) Move the original Firefox executable back to /usr/bin/firefox |
|
Back to top |
|
|
netjiro n00b
Joined: 30 Dec 2004 Posts: 47 Location: Liechtenstein
|
Posted: Sun Jan 07, 2018 1:27 pm Post subject: |
|
|
smooth :)
I found (rtfm) that plasma shell will execute any *.sh in ~/.config/plasma-workspace/env/ and honour those settings for the session. So I created a small script there and exported LANG and LANGUAGE.
Works fine now. Solved both firefox and dolphin issues.
Great help man !
Question remains, why did my KDE session not grab the system LANG settings as default? |
|
Back to top |
|
|
|