View previous topic :: View next topic |
Author |
Message |
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3343 Location: Rasi, Finland
|
Posted: Tue Feb 19, 2019 3:55 pm Post subject: Pattern match and regexp match can't handle this character? |
|
|
I stumbled up on this zip (a Doom2 wad).
I've been using app-arch/unzip to unpack the archive.
The file names clearly end with '.txt' and '.wad'. But find won't match them with -name/-iname. Also trying find's -regex/-iregex won't help (I know you need to match the file path entirely).
Then I started to invertigate. I ran find and then tried to grep it's output to see how it behaves...
The character doesn't even match '.' (dot) regular expression. I even tried something like this: Code: | find . -regextype egrep -iregex '([.]|[^.])*' | ... which tells me that the character kinda breaks pattern matching and regular expression matching too.
But just at the "last moment". I decided to try to set LC_ALL="C" and it works.
Somebody tell me what's going on here? _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Tue Feb 19, 2019 4:08 pm Post subject: |
|
|
I can't download the file. Maybe it contains an illegal UTF-8 character in a filename? |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3343 Location: Rasi, Finland
|
Posted: Tue Feb 19, 2019 4:42 pm Post subject: |
|
|
You're correct.
I just found out it's '\ufffd'.
I think I'll create something that renames those files with illegal characters... _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
Ant P. Watchman
Joined: 18 Apr 2009 Posts: 6920
|
Posted: Tue Feb 19, 2019 6:48 pm Post subject: |
|
|
app-misc/detox |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3343 Location: Rasi, Finland
|
Posted: Tue Feb 19, 2019 9:38 pm Post subject: |
|
|
Ant P. wrote: | app-misc/detox | Thanks. I think it beats perl-rename. :) _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3343 Location: Rasi, Finland
|
Posted: Tue Feb 19, 2019 9:58 pm Post subject: |
|
|
I didn't get -r working for detox. But with find ... -exec detox ... {} + it's perfectly usable.
Now I need to learn it a little more. Why haven't I heard of this before? _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
bunder Bodhisattva
Joined: 10 Apr 2004 Posts: 5934
|
Posted: Wed Feb 20, 2019 2:15 am Post subject: |
|
|
Found this in a PDF about migrating from nfs3 to nfs4...
Quote: | Internationalization support; UTF-8
NFSv4 uses UTF-8 for file names, directories, symlinks and user and group identifiers. As UTF-8 is
backwards compatible with 7 bit encoded ASCII, any names that are 7 bit ASCII will continue to work.
However, pre-existing names that contain 8 bit characters will be misinterpreted by NFSv4 as UTF-8
multibyte characters, which may result in errors such as not finding files.
For example, an NFSv3 file created with the name René contains an 8 bit ASCII character in the last
position. NFSv4 will assume that the é indicates a multibyte UTF-8 encoding, which will lead to
unexpected results.
•Action:
review existing NFSv3 names to ensure that they are 7 bit ASCII clean. |
maybe the filename encoding predates proper internationalization support. _________________
Neddyseagoon wrote: | The problem with leaving is that you can only do it once and it reduces your influence. |
banned from #gentoo since sept 2017 |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3343 Location: Rasi, Finland
|
Posted: Wed Feb 20, 2019 2:41 am Post subject: |
|
|
bunder wrote: | maybe the filename encoding predates proper internationalization support. | Well... It's a Doom2 wad... _________________ ..: Zucca :..
Gentoo IRC channels reside on Libera.Chat.
--
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
|