Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED-ish] partial UTF8 problems in some kde ...
View unanswered posts
View posts from last 24 hours
View posts from last 7 days

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
netjiro
n00b
n00b


Joined: 30 Dec 2004
Posts: 47
Location: Liechtenstein

PostPosted: Sun Jan 07, 2018 10:24 am    Post subject: [SOLVED-ish] partial UTF8 problems in some kde ... Reply with quote

I must have missed something, thankful for ideas on how to troubleshoot and fix:

I thought I had done the installation according to tfm (https://wiki.gentoo.org/wiki/UTF-8)
Most of the system works just perfectly fine with utf8 support.

BUT:
dolphin: klick/drag'n'drop of non plain ascii filename will not work, displaying an error message with the ascii interpreted filename. However, dolphin lists the utf8 filename correctly.
firefox: saving a file with a non-ascii character creates two files: one with the correct utf8 filename encoding but with 0 byte length, and one file with an extended ascii (CP1252 ?) encoding actually containing the data.

Code:
# cat /etc/env.d/02locale
LANG="en_US.utf8"

# eselect locale list
  [4]   en_US.utf8 *

# locale -a | grep utf
en_US.utf8


Any idea what I'm missing?


Last edited by netjiro on Sun Jan 07, 2018 1:28 pm; edited 1 time in total
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sun Jan 07, 2018 10:47 am    Post subject: Reply with quote

What is the output of
Code:
export | egrep "(LANG|LC_)"
Back to top
View user's profile Send private message
netjiro
n00b
n00b


Joined: 30 Dec 2004
Posts: 47
Location: Liechtenstein

PostPosted: Sun Jan 07, 2018 10:51 am    Post subject: Reply with quote

Quick further investigation:
Testing creating filenames with "€" sign in them:

octal codes: (LC_ALL=C ls -b)
kterm-euro:\342\202\254
dolphin-euro:\342\202\254
mozilla-euro:\302\254 -- and just single byte -- \254

So: both kterm and dolphin will create the same 0xE282AC for "€"
while mozilla creates one file with: 0xC2AC and one with 0xAC for the "€" character.

kterm and dolphin can however still display the mozilla version 0xC2AC as "€" even if they themselves create it as 0xE282AC
And none of them create the 0x20AC I was naively expecting since the unicode 0x20AC is listed as "€". (http://www.unicodemap.org/details/0x20AC/index.html) Or perhaps that's for UTF16 only.

This is weird.


Last edited by netjiro on Sun Jan 07, 2018 11:02 am; edited 1 time in total
Back to top
View user's profile Send private message
netjiro
n00b
n00b


Joined: 30 Dec 2004
Posts: 47
Location: Liechtenstein

PostPosted: Sun Jan 07, 2018 10:54 am    Post subject: Reply with quote

@mike155

Code:
# export | egrep "(LANG|LC_)"
declare -x LANG="en_US.utf8"
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sun Jan 07, 2018 11:02 am    Post subject: Reply with quote

Quote:
I was naively expecting since the unicode 0x20AC is listed as "€"

No. U+20AC is the Unicode code point. E2 82 AC is the UTF-8 hexadecimal encoding. There is a big difference between code points and UTF-8, UTF-16 and UTF-32 encodings.
Back to top
View user's profile Send private message
netjiro
n00b
n00b


Joined: 30 Dec 2004
Posts: 47
Location: Liechtenstein

PostPosted: Sun Jan 07, 2018 11:05 am    Post subject: Reply with quote

mike155 wrote:
Quote:
I was naively expecting since the unicode 0x20AC is listed as "€"

No. U+20AC is the Unicode code point. E2 82 AC is the UTF-8 hexadecimal encoding. There is a big difference between code points and UTF-8, UTF-16 and UTF-32 encodings.

Yep, should have known that. (hang my head in shame)

? But what is mozilla doing and why can't dolphin klick/dragndrop with the non-ascii named files?
I'm missing something somewhere...
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sun Jan 07, 2018 11:19 am    Post subject: Reply with quote

Code:
declare -x LANG="en_US.utf8"

It's probably because LC environment variables are missing. Try to set LC_ALL to "en_US.utf8" before starting your desktop environment. If you need more control, you can set individual LC environment variables (instead of LC_ALL). Output of the command above on my computer is:
Code:
declare -x LANG="de_DE.utf8"
declare -x LANGUAGE="de_DE.utf8"
declare -x LC_ADDRESS="de_DE.utf8"
declare -x LC_COLLATE="POSIX"
declare -x LC_CTYPE="de_DE.utf8"
declare -x LC_IDENTIFICATION="de_DE.utf8"
declare -x LC_MEASUREMENT="de_DE.UTF-8"
declare -x LC_MESSAGES="en_US.utf8"
declare -x LC_MONETARY="de_DE.UTF-8"
declare -x LC_NAME="de_DE.utf8"
declare -x LC_NUMERIC="de_DE.UTF-8"
declare -x LC_PAPER="de_DE.UTF-8"
declare -x LC_TELEPHONE="de_DE.utf8"
declare -x LC_TIME="de_DE.UTF-8"
Back to top
View user's profile Send private message
netjiro
n00b
n00b


Joined: 30 Dec 2004
Posts: 47
Location: Liechtenstein

PostPosted: Sun Jan 07, 2018 11:24 am    Post subject: Reply with quote

Great :) Thanks !

Where is the best place/way to set them?
Where do you do it?
Back to top
View user's profile Send private message
netjiro
n00b
n00b


Joined: 30 Dec 2004
Posts: 47
Location: Liechtenstein

PostPosted: Sun Jan 07, 2018 11:34 am    Post subject: Reply with quote

Hmm, the "locale" command lists the variables, but "export" does not:

Code:
# locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=


Code:
# export | egrep "(LANG|LC_)"
declare -x LANG="en_US.utf8"
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sun Jan 07, 2018 11:55 am    Post subject: Reply with quote

Quote:
It's probably because LC environment variables are missing.

I re-checked documentation and I think I was wrong. It should be sufficient nowadays to set LANG.
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sun Jan 07, 2018 12:39 pm    Post subject: Reply with quote

Quote:
firefox: saving a file with a non-ascii character creates two files: one with the correct utf8 filename encoding but with 0 byte length, and one file with an extended ascii (CP1252 ?) encoding actually containing the data.

I can reproduce this if I unset all LANG, LANGUAGE and LC environment variables, start firefox, select File -> 'Save Page As' and enter a filename with non-ascii characters.

If I set the LANG environment variable, Firefox works as expected
Back to top
View user's profile Send private message
netjiro
n00b
n00b


Joined: 30 Dec 2004
Posts: 47
Location: Liechtenstein

PostPosted: Sun Jan 07, 2018 1:12 pm    Post subject: Reply with quote

Great!

I just tested here:
If I start firefox from the kde launcher or other kde environment it will misbehave.
If I start firefox from a kterm console it will behave well and use the correct encoding.

So there is something wrong in my KDE environment, outside kterm console login.

In kterm console login:
Code:
# export | grep LANG
declare -x LANG="en_US.utf8"
declare -x LANGUAGE=""

And it behaves the same if I set LANGUAGE:
Code:
# export LANGUAGE="en_US.utf8"
# export | grep LANG
declare -x LANG="en_US.utf8"
declare -x LANGUAGE="en_US.utf8"


My .bashrc and .bash_profile just source /etc/profile, /etc/profile.d/bash-completion, add a local path, and set umask and prompt.

Checked that no LANG* or LC_* is set in /etc/profile

But I see that LANG is exported as export LANG='en_US.utf8' in the autogenerated /etc/profile.env

So, then the question becomes, where and how is my KDE environment set and how can I check what is declared in that environment?
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Sun Jan 07, 2018 1:17 pm    Post subject: Reply with quote

You could try the following:

1) rename /usr/bin/firefox
2) Create a shell script "usr/bin/firefox" with the contents below
Code:
#!/bin/bash
export > /tmp/firefox-export.txt

3) Start Firefox from KDE launcher
4) Look at the file /tmp/firefox-export.txt to see which environment variables are passed to Firefox if started from KDE launcher
5) Move the original Firefox executable back to /usr/bin/firefox
Back to top
View user's profile Send private message
netjiro
n00b
n00b


Joined: 30 Dec 2004
Posts: 47
Location: Liechtenstein

PostPosted: Sun Jan 07, 2018 1:27 pm    Post subject: Reply with quote

smooth :)

I found (rtfm) that plasma shell will execute any *.sh in ~/.config/plasma-workspace/env/ and honour those settings for the session. So I created a small script there and exported LANG and LANGUAGE.

Works fine now. Solved both firefox and dolphin issues.

Great help man !


Question remains, why did my KDE session not grab the system LANG settings as default?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum