Java: Methods of removing accentuated characters not working

coltson · n00b Joined: 15 Oct 2005 Posts: 54

Hi, I have been trying to remove accented characters from strings, replacing them by their no accentuate one equivalents.
The problem is that it is not working. And the problem ain't algorithm because I tried three different algorithms using copy and paste. Here is part of the output:

Voltago · Posted: Sun May 26, 2013 10:15 pm Post subject:

Coincidentally, I've written something quite similar recently to replace certain unicode characters with their LaTeX representation, which worked on my system. Perhaps you can try to copy and paste the code, and if it produces the same errors as your own, you should have a look at your system locales (command "locale") to see if the character set you are using is able to deal with those symbols. If you haven't already, maybe switching to a UTF-8 locale will solve your problems (there is a document about it in the official gentoo documentation iirc).

EDIT: Seeing as you have already tried different code snippets, and as one accented character is converted to two characters '??', this probably is related to your chosen system locale. In UTF-8, those characters are represented by two bytes (or more), while the standard ascii ones are given by one byte. I think your conversion works correctly, but your system is not able to parse those two-byte characters and gives you two question marks instead.

The output of the "locale" command is: