Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
freerdp - Cut english, paste Japanese
View unanswered posts
View posts from last 24 hours
View posts from last 7 days

 
Reply to topic    Gentoo Forums Forum Index Desktop Environments
View previous topic :: View next topic  
Author Message
grooveman
Veteran
Veteran


Joined: 24 Feb 2003
Posts: 1217

PostPosted: Wed Feb 01, 2017 10:09 pm    Post subject: freerdp - Cut english, paste Japanese Reply with quote

Hi, I use Freerdp all the time to remote control windows servers. The other day I was remoted into a 2012R2 server and running SQL Server Management Studio. I was trying to delete some rows in a table when I noticed something very odd. It was a very simple sql command, but I kept getting this error:

Code:
Msg 8152, Level 16, State 13, Line 1
String or binary data would be truncated.


Which is the type of error you get when you are trying to cram too many characters in a field. But, obviously, I wasn't doing that. I was trying to delete a record. So, I send my sql code to a colleague, who ran it exactly as I sent it on the same system, and it worked without an error.

I noticed that nearly all my sql commands were giving me this error. Then I tried it by remoting in with a windows client -- and it worked as expected. No errors. I tinkered around a little bit to find out what was going on.

Then I decided to try an experiment. I opened two sessions using freerdp. To two different 2012r2 servers. I opened SQL server management studio, and cut the code from one session into the other. I was suprised by the result.

The command I cut was in english (well, using english characters anyway), the characters that came out when I pasted it were Japanese! I took SQL Server management studio out of the equation and just pasted from one notepad to the other, and it was 100% replicable. Cut English, Paste Japanese.

Cut from System A:
Code:
Hi Everybody!

Pasted into System B:
Code:
楈䔠敶祲潢祤A喪


Cut from System B:
Code:
Hi Dr. Nick!

Pasted into System A:
Code:
楈䐠⹲丠捩


(It looks only like blocks on my linux box, but trust me, it is japanese on the windows systems.)

Anyone have any idea what is going on here? Can this be fixed, or is it a bug? Is this a UTF/ASCII thing or a codepage thing? I tried upgrading to the latest version of freerdp in portage, but it doesn't help. I need to do this kind of thing a lot, and this is really going to get in my way...

Thanks!

G
_________________
To look without without looking within is like looking without without looking at all.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21633

PostPosted: Thu Feb 02, 2017 2:15 am    Post subject: Reply with quote

This is not an English -> Japanese problem. This is a UTF-8 vs UTF-16 problem. Microsoft systems have an unfortunate habit of trying to use UTF-16 to represent strings, unlike Linux. Mostly, this just causes problems because it gives English-speaking Windows programmers the incorrect belief that they can treat one character (in a wchar_t) as one codepoint.

Somehow you copied a string as UTF-8, then inserted those same bytes into a system that incorrectly thought it was receiving UTF-16. As a result, every pair of characters in the input became a single wide character in the output. You can partially see this by copying your block text into a program that produces hexadecimal dumps, such as xxd. Take care that your tools do not further mangle the text with encoding transformations during this experiment.

I cannot explain why this happens to you or help you solve it, but I hope that knowing your problem is about UTF-8 vs UTF-16 will guide your investigation down the right path.
Back to top
View user's profile Send private message
grooveman
Veteran
Veteran


Joined: 24 Feb 2003
Posts: 1217

PostPosted: Thu Feb 02, 2017 12:55 pm    Post subject: Reply with quote

Hi Hu,

Thanks for the response.

I figured it was something like that.... but didn't know windows used UTF-16.

So I conducted one more experiment that yields a work-around (because I don't see this getting fixed, unless I can set my linux box to UTF-16).

As I established above, if I cut from Server A and paste into server B, I get Japanese characters (or vice-versa). But, if I cut from Server A, paste into my linux system (from which I'm doing all this remoting), then cut that and paste it into Server B, it comes out using english letters (as was my intent).

So, I'll just have to cut my code from my windows servers, paste it into kwrite (or something similar), then paste it into the destination windows server. I don't like it... but at least it is a way around.

incidentally, is there a way to set my linux system to UTF-16? What complications would that cause?

Thanks.

G
_________________
To look without without looking within is like looking without without looking at all.
Back to top
View user's profile Send private message
Chiitoo
Administrator
Administrator


Joined: 28 Feb 2010
Posts: 2575
Location: Here and Away Again

PostPosted: Thu Feb 02, 2017 1:45 pm    Post subject: ><)))°€ Reply with quote

This might be somewhat pedantic, and I could be wrong, but that looks more Chinese to me, though Japanese has at least some of those, too. :]

(Not that I can really read it.)


As for the issue, I don't unfortunately have any ideas on what is going on there. I might look into experimenting with locale settings (see for example the Localization Guide at our wikki).
_________________
Kindest of regardses.
Back to top
View user's profile Send private message
grooveman
Veteran
Veteran


Joined: 24 Feb 2003
Posts: 1217

PostPosted: Thu Feb 02, 2017 5:16 pm    Post subject: Reply with quote

Google translate auto-detected it as Japanese. I defer to them, as I am not an expert.

I will revisit the localization guide.

Thanks.

G
_________________
To look without without looking within is like looking without without looking at all.
Back to top
View user's profile Send private message
Chiitoo
Administrator
Administrator


Joined: 28 Feb 2010
Posts: 2575
Location: Here and Away Again

PostPosted: Thu Feb 02, 2017 10:20 pm    Post subject: Reply with quote

Heh. That's interesting.

I gave Google translate a go as well, and to me, it says Chinese. Perhaps it depends on the time of the day, or alignment of things... or both.

OK now it gave me English, Korean, and Japanese, too... haha.
_________________
Kindest of regardses.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21633

PostPosted: Fri Feb 03, 2017 2:25 am    Post subject: Reply with quote

The fix is that your RDP client needs to handle transforming the text between UTF-8 / UTF-16 or it needs to properly advise the server about what type of text is being processed, so that the server handles the transformation. I am rather surprised that this ever made it into a release. I suspect you have some unusual configuration that is otherwise untested.

Linux treats strings as a null-terminated sequence of octets. You cannot use UTF-16 in such a system because the first character with its high byte clear would be interpreted as the null terminator for the string. You can use UTF-8 to represent codepoints that are not part of the Latin1 encoding. Your tools should automatically reencode between UTF-16 (Windows) and UTF-8 (everyone else) as needed and transparently to the user.
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Fri Feb 03, 2017 2:41 pm    Post subject: Re: ><)))°€ Reply with quote

Chiitoo wrote:
This might be somewhat pedantic, and I could be wrong, but that looks more Chinese to me, though Japanese has at least some of those, too. :]
A Chinese friend (born in China, grad school in USA) who speaks both Cantonese and Mandarin, informs me that both languages AND Japanese use the exact same characters although the verbal words differ considerably in all three languages. He can't speak Japanese but can read a Japanese newspaper.
Back to top
View user's profile Send private message
Chiitoo
Administrator
Administrator


Joined: 28 Feb 2010
Posts: 2575
Location: Here and Away Again

PostPosted: Fri Feb 03, 2017 3:39 pm    Post subject: Re: ><)))°€ Reply with quote

Tony0945 wrote:
A Chinese friend (born in China, grad school in USA) who speaks both Cantonese and Mandarin, informs me that both languages AND Japanese use the exact same characters although the verbal words differ considerably in all three languages. He can't speak Japanese but can read a Japanese newspaper.

Indeed!

I did not mean to say they don't use them... that was bad wording by me.

What I believe I meant was more about that it didn't seem something I'd expect. That is, there are no hiragana or katakana at all. That doesn't really tell much, either, since it's probably not like there was actual translation happening... but still.

I guess to be the most correct in this case, would be to not call it Chinese or Japanese, but hànzì? (Not that it really matters... and calling it Japanese or Chinese is perhaps more descriptive to most readers...)

Thanks for pointing that out. I did realise I had some muddled thoughts going on over there. :S


A bit more on the topic... I can pretty much exactly reproduce the results by saving the text into a file as UTF-8, then viewing it in UTF-16 (simply switching it via KWrite Tools).

I do remember now seeing it happen before, when my application/system encoding settings weren't too sane...
_________________
Kindest of regardses.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Desktop Environments All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum