Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Other Things Gentoo
  • Search

[Solved] unrar and unicode characters

Still need help with Gentoo, and your question doesn't fit in the above forums? Here is your last bastion of hope.
Post Reply
Advanced search
11 posts • Page 1 of 1
Author
Message
manywele
l33t
l33t
User avatar
Posts: 743
Joined: Sat Jul 12, 2003 12:48 am
Location: Inside

[Solved] unrar and unicode characters

  • Quote

Post by manywele » Sat May 14, 2022 6:55 pm

I'm trying to extract a rar that has a c with a cedille on it. (The rar is here if you're interested.) I'm extracting the rar on command line using app-arch/unrar in an rxvt-unicode term with the following use flags

Code: Select all

[ebuild   R    ] x11-terms/rxvt-unicode-9.30::gentoo  USE="fading-colors font-styles mousewheel perl unicode3 xft -24-bit-color -256-color -blink -gdk-pixbuf -iso14755 -startup-notification"
It chokes on the c-cedille and truncates all characters from the c-cedille to the end of the filename. How do I get it to either a) change the c-cedille to a 'c', b) skip the c-cedille entirely or c) correctly use the c-cedille to make a filename? I see that unrar has a "Specify the character set" option but I have no idea what that means or how to use it.
Last edited by manywele on Sun May 15, 2022 5:20 pm, edited 1 time in total.
Top
mike155
Advocate
Advocate
Posts: 4438
Joined: Fri Sep 17, 2010 11:33 pm
Location: Frankfurt, Germany

  • Quote

Post by mike155 » Sat May 14, 2022 10:03 pm

It works for me...

Let's create a test rar file:

Code: Select all

$ cd /tmp
$ mkdir test
$ touch test/test_öäüµ_çéá.txt
$ rar a test.rar test
Let's look at the table of contents of the rar archive:

Code: Select all

$ unrar v test.rar

UNRAR 6.02 freeware      Copyright (c) 1993-2021 Alexander Roshal

Archive: test.rar
Details: RAR 5

 Attributes      Size    Packed Ratio    Date    Time   Checksum  Name
----------- ---------  -------- ----- ---------- -----  --------  ----
 -rw-r-----         0         0   0%  2022-05-14 23:52  00000000  test/test_öäüµ_çéá.txt
 drwxr-x---         0         0   0%  2022-05-14 23:52  00000000  test
----------- ---------  -------- ----- ---------- -----  --------  ----
                    0         0   0%                              2
Looks good. :)

Could it be that your problem is not related to unrar, but that your terminal is not able do display Unicode characters?
Top
manywele
l33t
l33t
User avatar
Posts: 743
Joined: Sat Jul 12, 2003 12:48 am
Location: Inside

  • Quote

Post by manywele » Sun May 15, 2022 12:29 am

What terminal are you using? Or how do I make my terminal use unicode properly? It is called rxvt-unicode after all and claims to have unicode support. "Works for me *shrug*" isn't particularly helpful in solving my problem.
Top
Hu
Administrator
Administrator
Posts: 24395
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Sun May 15, 2022 12:51 am

I think "works for me" is a good start. It says that the problem can be solved using existing tools, without patching, and apparently in a configuration common enough that mike155's defaults caused it to work on the first try. To be sure we are pursuing the same problem, please post the exact error message you get. It is not clear to me which of these is your issue:
  • unrar fails to create any files when you direct it to unpack the rar
  • unrar creates a file with a completely incorrect name
  • unrar creates a file with a correct name, which you are then unable to use in any of your other tools
Top
manywele
l33t
l33t
User avatar
Posts: 743
Joined: Sat Jul 12, 2003 12:48 am
Location: Inside

  • Quote

Post by manywele » Sun May 15, 2022 2:24 am

I get no error message.
unrar successfully creates files. All of the unicode characters are replaced with '?' in the output. For example

Code: Select all

unrar l baobab\ roots\ and\ fruit.rar 

 Attributes      Size     Date    Time   Name
----------- ---------  ---------- -----  ----
    ..A....  18938213  2021-09-21 07:25  Baobab (Casa?ais) - Roots and Fruit/01 - Baobab (Casa?ais) - Mansa.mp3
    ..A....  14784349  2021-09-21 07:25  Baobab (Casa?ais) - Roots and Fruit/02 - Baobab (Casa?ais) - Mansan? ciss?.mp3
etc...
When extracted I get a prompt about replacing files because it has truncated the directory to "Baobab (Casa" leaving everything after the '?' blank and it has written the first file to the truncated directory name.

Code: Select all

unrar x baobab\ roots\ and\ fruit.rar
Extracting  Baobab (Casa?ais) - Roots and Fruit/01 - Baobab (Casa?ais) - Mansa.mp3  OK 

Would you like to replace the existing file Baobab (Casa?ais) - Roots and Fruit/02 - Baobab (Casa?ais) - Mansan? ciss?.mp3
18938213 bytes, modified on 2021-09-21 07:25
with a new one
14784349 bytes, modified on 2021-09-21 07:25

[Y]es, [N]o, [A]ll, n[E]ver, [R]ename, [Q]uit 
The result after pressing 'Q' is that a single file has been created named "Baobab (Casa". That file is the first file in the archive. If I use "unrar e" instead of "unrar x" then I get a full rar extraction of all the files but all of the file names have been truncated at the '?'. So the first file becomes '01 - Baobab (Casa'. That is what I meant in my first post when I said the files are truncated at the unicode character.

In short, all files can be extracted. Any file with a unicode character has its filename truncated. All extracted files are of the correct type and not corrupted.
Top
Hu
Administrator
Administrator
Posts: 24395
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Sun May 15, 2022 3:20 am

If the file's name is truncated on disk, why is the replace-file confirmation prompt using the full name? This seems more like your terminal is misrepresenting the data. What is the output of ls -l | xxd in that directory?
Top
manywele
l33t
l33t
User avatar
Posts: 743
Joined: Sat Jul 12, 2003 12:48 am
Location: Inside

  • Quote

Post by manywele » Sun May 15, 2022 5:45 am

Hu wrote:What is the output of ls -l | xxd in that directory?

Code: Select all

ls -l | xxd
00000000: 746f 7461 6c20 3135 3137 3132 0a2d 7277  total 151712.-rw
00000010: 2d72 2d2d 722d 2d20 3120 626f 6220 626f  -r--r-- 1 bob bo
00000020: 6220 2031 3839 3338 3231 3320 5365 7020  b  18938213 Sep 
00000030: 3231 2020 3230 3231 2042 616f 6261 6220  21  2021 Baobab 
00000040: 2843 6173 610a 2d72 772d 722d 2d72 2d2d  (Casa.-rw-r--r--
00000050: 2031 2062 6f62 2062 6f62 2031 3336 3430   1 bob bob 13640
00000060: 3238 3133 2053 6570 2032 3420 2032 3032  2813 Sep 24  202
00000070: 3120 6261 6f62 6162 2072 6f6f 7473 2061  1 baobab roots a
00000080: 6e64 2066 7275 6974 2e72 6172 0a2d 7277  nd fruit.rar.-rw
00000090: 7872 2d2d 722d 2d20 3120 626f 6220 626f  xr--r-- 1 bob bo
000000a0: 6220 2020 2020 2020 3933 3920 4d61 7920  b       939 May 
000000b0: 3134 2031 313a 3331 2063 6f70 7972 6172  14 11:31 copyrar
000000c0: 6d75 7369 632e 7368 0a                   music.sh.
Edit: In case it's not clear that directory has a file that was extracted after running "unrar x" on the .rar file and then choosing 'Q'.
Top
Hu
Administrator
Administrator
Posts: 24395
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Sun May 15, 2022 3:54 pm

That output more directly confirms what your earlier post suggested, but was unclear on. The unrar tool is passing an incorrectly truncated name to the kernel, so the name recorded in the filesystem is incomplete. Your terminal has nothing to do with it. Your problem is in unrar. I don't suppose the content is also available in a more accessible format? Under what locale are you running unrar?
Top
manywele
l33t
l33t
User avatar
Posts: 743
Joined: Sat Jul 12, 2003 12:48 am
Location: Inside

  • Quote

Post by manywele » Sun May 15, 2022 5:20 pm

The content is not available in a more accessible format. This is a guy who lives in The Netherlands, hunts down out of print LPs from around the world, mostly Africa and South America, and posts rips.

My locale was set to POSIX for some reason. Setting it to en_US.utf8 allows the extraction to work correctly. It doesn't show the correct character but at least it works.

Code: Select all

ll
total 131M
drwxr-xr-x 2 bob bob 4.0K Sep 21  2021 Baobab (Casaçais) - Roots and Fruit
Thank you for your help. I had completely forgotten about locales. I know I had it set to utf8 when I set the system up. No idea when or why I switched it to POSIX.
Top
mike155
Advocate
Advocate
Posts: 4438
Joined: Fri Sep 17, 2010 11:33 pm
Location: Frankfurt, Germany

  • Quote

Post by mike155 » Sun May 15, 2022 5:56 pm

It doesn't show the correct character but at least it works.
Most likely, this happens because you start urxvt in iso88591 mode.

Try

Code: Select all

LANG="en_US.utf8" urxvt
Top
manywele
l33t
l33t
User avatar
Posts: 743
Joined: Sat Jul 12, 2003 12:48 am
Location: Inside

  • Quote

Post by manywele » Sun May 15, 2022 6:34 pm

Thank you!!! Getting character encoding consistently working across all aspects of a linux environment has always seemed confusing.
Top
Post Reply

11 posts • Page 1 of 1

Return to “Other Things Gentoo”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic