View previous topic :: View next topic |
Author |
Message |
coltson n00b
Joined: 15 Oct 2005 Posts: 54
|
Posted: Wed May 15, 2013 11:02 pm Post subject: Java: Methods of removing accentuated characters not working |
|
|
Hi, I have been trying to remove accented characters from strings, replacing them by their no accentuate one equivalents.
The problem is that it is not working. And the problem ain't algorithm because I tried three different algorithms using copy and paste. Here is part of the output: Quote: | O welington capxaba ?? um pessimo jogador de futebol.
O welington capxaba ?? um pessimo jogador de futebol.
|
The first line is before removing the accentuated characters, so it is working as intended (The terminal not being able to show accentuated characters is irrelevant). The second line is where the problem happens. Should be converted to "O welington capxaba e um pessimo jogador de futebol". The original string it is: "O welington capxaba é um pessimo jogador de futebol" I think this problem can be related to a previous problem in which I wasn't able to compile the code. Here is a resumed version of the errors that I got: Quote: | Procurador de Texto/FileManager.java:116: error: unmappable character for encoding ASCII
case '??':
^
Procurador de Texto/FileManager.java:116: error: unmappable character for encoding ASCII
case '??':
^
|
and
Quote: | Procurador de Texto/FileManager.java:116: error: unclosed character literal
case '??':
^
Procurador de Texto/FileManager.java:116: error: unclosed character literal
case '??':
^
|
This happened in lines like: . These two errors happened 88 times through the code, so I am not going to post all. Always in accentuated characters which are used in one of the algorithms.
I freed myself of them with Quote: | ./javac -encoding UTF8 Procurador\ de\ Texto/Main.java Procurador\ de\ Texto/FileManager.java |
So I think these two problems are related. Any ideas? |
|
Back to top |
|
|
Voltago Advocate
Joined: 02 Sep 2003 Posts: 2593 Location: userland
|
Posted: Sun May 26, 2013 10:15 pm Post subject: |
|
|
Coincidentally, I've written something quite similar recently to replace certain unicode characters with their LaTeX representation, which worked on my system. Perhaps you can try to copy and paste the code, and if it produces the same errors as your own, you should have a look at your system locales (command "locale") to see if the character set you are using is able to deal with those symbols. If you haven't already, maybe switching to a UTF-8 locale will solve your problems (there is a document about it in the official gentoo documentation iirc).
EDIT: Seeing as you have already tried different code snippets, and as one accented character is converted to two characters '??', this probably is related to your chosen system locale. In UTF-8, those characters are represented by two bytes (or more), while the standard ascii ones are given by one byte. I think your conversion works correctly, but your system is not able to parse those two-byte characters and gives you two question marks instead.
The output of the "locale" command is:
Code: | LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC=en_US.utf8
LC_TIME=de_DE.utf8
LC_COLLATE="en_US.utf8"
LC_MONETARY=de_DE.utf8
LC_MESSAGES=en_US.utf8
LC_PAPER=de_DE.utf8
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL= |
Code: | public class EncodingTest
{
public static String encodeUnicodeToLatex(CharSequence str)
{
StringBuilder sb = new StringBuilder();
String r;
int last=-1;
int length = str.length();
for(int i=0; i<length; i++)
{
char ch = str.charAt(i);
if(ch>127) switch(ch)
{
case '’':
r="'";
break;
case '„':
r="``";
break;
case '“':
r="''";
break;
case 'à':
r="\\`{a}";
break;
case 'â':
r="\\^{a}";
break;
case 'ä':
r="\\\"{a}";
break;
case 'æ':
r="{\\ae}";
break;
case 'ç':
r="\\c{c}";
break;
case 'é':
r="\\'{e}";
break;
case 'è':
r="\\`{e}";
break;
case 'ê':
r="\\^{e}";
break;
case 'ë':
r="\\\"{e}";
break;
case 'î':
r="\\\"{\\i}";
break;
case 'ï':
r="\\\"{\\i}";
break;
case 'ô':
r="\\^{o}";
break;
case 'ö':
r="\\\"{o}";
break;
case 'œ':
r="{\\oe}";
break;
case 'ß':
r="{\\ss}";
break;
case 'ù':
r="\\`{u}";
break;
case 'û':
r="\\^{u}";
break;
case 'ü':
r="\\\"{u}";
break;
case 'ÿ':
r="\\\"{y}";
break;
case 'À':
r="\\`{A}";
break;
case 'Â':
r="\\^{A}";
break;
case 'Ä':
r="\\\"{A}";
break;
case 'Æ':
r="{\\AE}";
break;
case 'Ç':
r="\\c{C}";
break;
case 'É':
r="\\'{E}";
break;
case 'È':
r="\\`{E}";
break;
case 'Ê':
r="\\^{E}";
break;
case 'Ë':
r="\\\"{E}";
break;
case 'Î':
r="\\^{I}";
break;
case 'Ï':
r="\\\"{I}";
break;
case 'Ô':
r="\\^{O}";
break;
case 'Ö':
r="\\\"{O}";
break;
case 'Œ':
r="{\\OE}";
break;
case 'Ù':
r="\\`{U}";
break;
case 'Û':
r="\\^{U}";
break;
case 'Ü':
r="\\\"{U}";
break;
case 'Ÿ':
r="\\\"{Y}";
break;
default:
r=null;
}
else
r=null;
if(r!=null)
{
// System.out.printf("%s, %d, %d\n", r, last+1, i);
CharSequence substr = str.subSequence(last+1, i);
sb.append(substr);
sb.append(r);
last=i;
}
}
CharSequence substr = str.subSequence(last+1, length);
sb.append(substr);
return sb.toString();
}
public static void main(String[] args) throws Exception
{
String str="ùûüÿàâæçéèêëïîôœÙÛÜŸÀÂÆÇÉÈÊËÏÎÔŒäöüß’„“ÄÖÜ";
System.out.println(encodeUnicodeToLatex(str));
}
} |
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|