Créer un caractère Unicode à partir de son numéro

Question

Je souhaite afficher un caractère Unicode en Java. Si je fais ça, ça marche très bien:

String symbol = "\u2202";

le symbole est égal à "". C'est ce que je veux.

Le problème est que je connais le numéro Unicode et que je dois en créer le symbole Unicode. J'ai essayé (pour moi) la chose évidente:

int c = 2202; String symbol = "\u" + c;

Cependant, dans ce cas, le symbole est égal à "\ u2202". Ce n'est pas ce que je veux.

Comment puis-je construire le symbole si je connais son numéro Unicode (mais seulement au moment de l'exécution --- je ne peux pas le coder en dur comme dans le premier exemple)?

dty · Accepted Answer

Il suffit de lancer votre int en un char. Vous pouvez convertir cela en String en utilisant Character.toString():

_String s = Character.toString((char)c); _

MODIFIER:

Rappelez-vous simplement que les séquences d'échappement dans le code source Java (les _\u_ bits) sont dans HEX. Ainsi, si vous essayez de reproduire une séquence d'échappement, vous aurez besoin de quelque chose comme _int c = 0x2202_.

McDowell · Answer

Si vous souhaitez obtenir une unité de code codé UTF-16 sous la forme d'un char, vous pouvez analyser le nombre entier et le transpercer comme d'autres l'ont suggéré.

Si vous souhaitez prendre en charge tous les points de code, utilisez Character.toChars(int) . Ceci gérera les cas où les points de code ne peuvent pas tenir dans une seule valeur char.

Doc dit:

Convertit le caractère spécifié (point de code Unicode) en sa représentation UTF-16 stockée dans un tableau de caractères. Si le point de code spécifié est une valeur BMP (plan multilingue de base ou plan 0), le tableau de caractères obtenu a la même valeur que codePoint. Si le point de code spécifié est un point de code supplémentaire, le tableau de caractères résultant contient la paire de substitution correspondante.

eis · Answer

Les autres réponses ici ne prennent en charge que l’unicode jusqu’à U + FFFF (les réponses ne traitant qu’une seule instance de caractère) ou ne disent pas comment accéder au symbole réel (les réponses s’arrêtant sur Character.toChars () ou en utilisant une méthode incorrecte). après cela), ajoutant donc ma réponse ici aussi.

Pour prendre en charge des points de code supplémentaires également, procédez comme suit:

// this character: // http://www.isthisthingon.org/unicode/index.php?page=1F&subpage=4&glyph=1F495 // using code points here, not U+n notation // for equivalence with U+n, below would be 0xnnnn int codePoint = 128149; // converting to char[] pair char[] charPair = Character.toChars(codePoint); // and to String, containing the character we want String symbol = new String(charPair); // we now have str with the desired character as the first item // confirm that we indeed have character with code point 128149 System.out.println("First code point: " + symbol.codePointAt(0));

J'ai également fait un test rapide pour savoir quelles méthodes de conversion fonctionnent et lesquelles ne fonctionnent pas.

int codePoint = 128149; char[] charPair = Character.toChars(codePoint); String str = new String(charPair, 0, 2); System.out.println("First code point: " + str.codePointAt(0)); // 128149, worked String str2 = charPair.toString(); System.out.println("Second code point: " + str2.codePointAt(0)); // 91, didn't work String str3 = new String(charPair); System.out.println("Third code point: " + str3.codePointAt(0)); // 128149, worked String str4 = String.valueOf(code); System.out.println("Fourth code point: " + str4.codePointAt(0)); // 49, didn't work String str5 = new String(new int[] {codePoint}, 0, 1); System.out.println("Fifth code point: " + str5.codePointAt(0)); // 128149, worked

MeraNaamJoker · Answer

Celui-ci a bien fonctionné pour moi.

 String cc2 = "2202"; String text2 = String.valueOf(Character.toChars(Integer.parseInt(cc2, 16)));

Maintenant, text2 aura ∂.

ILMTitan · Answer

Rappelez-vous que char est un type entier et peut donc recevoir une valeur entière, ainsi qu'une constante char.

char c = 0x2202;//aka 8706 in decimal. \u codepoints are in hex. String s = String.valueOf(c);

Kapil K. Kushwah · Answer

String st="2202"; int cp=Integer.parseInt(st,16);// it convert st into hex number. char c[]=Character.toChars(cp); System.out.println(c);// its display the character corresponding to '\u2202'.

skomisa · Answer

Bien que cette question soit ancienne, il existe un moyen très simple de le faire dans Java 11, publié ce jour: vous pouvez utiliser ne nouvelle surcharge de Character.toString () :

public static String toString(int codePoint) Returns a String object representing the specified character (Unicode code point). The result is a string of length 1 or 2, consisting solely of the specified codePoint. Parameters: codePoint - the codePoint to be converted Returns: the string representation of the specified codePoint Throws: IllegalArgumentException - if the specified codePoint is not a valid Unicode code point. Since: 11

Comme cette méthode prend en charge tout point de code Unicode, la longueur de la chaîne renvoyée n'est pas nécessairement 1.

Le code nécessaire pour l'exemple donné dans la question est simplement:

 int codePoint = '\u2202'; String s = Character.toString(codePoint); // <<< Requires JDK 11 !!! System.out.println(s); // Prints ∂

Cette approche offre plusieurs avantages:

Cela fonctionne pour tout point de code Unicode plutôt que seulement ceux qui peuvent être gérés à l'aide d'un char.
C'est concis et il est facile de comprendre ce que fait le code.
Il renvoie la valeur sous forme de chaîne plutôt que de char[], ce qui est souvent ce que vous voulez. La réponse publiée par McDowell est appropriée si vous souhaitez que le code soit retourné sous la forme char[].

Paul Reiners · Answer

Voici comment vous le faites:

int cc = 0x2202; char ccc = (char) Integer.parseInt(String.valueOf(cc), 16); final String text = String.valueOf(ccc);

Cette solution est de Arne Vajhøj.

user96265 · Answer

Le code ci-dessous écrira les 4 caractères Unicode (représentés par des décimales) pour le mot "be" en japonais. Oui, le verbe "be" en japonais a 4 caractères! La valeur des caractères est en décimal et il a été lu dans un tableau de String [] - en utilisant split par exemple. Si vous avez Octal ou Hex, parseInt prenez également une base.

// pseudo code // 1. init the String[] containing the 4 unicodes in decima :: intsInStrs // 2. allocate the proper number of character pairs :: c2s // 3. Using Integer.parseInt (... with radix or not) get the right int value // 4. place it in the correct location of in the array of character pairs // 5. convert c2s[] to String // 6. print String[] intsInStrs = {"12354", "12426", "12414", "12377"}; // 1. char [] c2s = new char [intsInStrs.length * 2]; // 2. two chars per unicode int ii = 0; for (String intString : intsInStrs) { // 3. NB ii*2 because the 16 bit value of Unicode is written in 2 chars Character.toChars(Integer.parseInt(intsInStrs[ii]), c2s, ii * 2 ); // 3 + 4 ++ii; // advance to the next char } String symbols = new String(c2s); // 5. System.out.println("
Looooonger code point: " + symbols); // 6. // I tested it in Eclipse and Java 7 and it works. Enjoy

fjiang_ca · Answer

Voici un bloc pour imprimer des caractères Unicode entre \u00c0 à \u00ff:

char[] ca = {'\u00c0'}; for (int i = 0; i < 4; i++) { for (int j = 0; j < 16; j++) { String sc = new String(ca); System.out.print(sc + " "); ca[0]++; } System.out.println(); }

hariprasad · Answer

Malheureusement, supprimer une réaction comme mentionné dans le premier commentaire (newbiedoodle) ne donne pas de bons résultats. La plupart (sinon la totalité) IDE génère une erreur de syntaxe. La raison en est que Java le format Unicode échappé attend la syntaxe "\ uXXXX", où XXXX sont 4 chiffres hexadécimaux, qui sont obligatoires. Les tentatives pour plier cette chaîne à partir de pièces échouent. Bien sûr, "\ u" n'est pas la même chose que "\ u". La première syntaxe signifie "échappé", la deuxième signifie "backlash" échappé (ou "backlash") suivi de "u". Il est étrange que, sur les pages Apache, l’utilitaire présenté présente exactement ce comportement. Mais en réalité, il est Escape mimic utility . Apache a ses propres utilitaires (je ne les ai pas testés), qui font ce travail pour vous. Peut-être, ce n'est toujours pas cela, ce que vous voulez avoir. tilitaires Apache Escape Unicode Mais cet utilitaire 1 a une bonne approche de la solution. Avec la combinaison décrite ci-dessus (MeraNaamJoker). Ma solution est de créer cette chaîne imitée échappée, puis de la reconvertir en unicode (pour éviter toute restriction Unicode échappée). Je l'ai utilisé pour copier du texte, il est donc possible que, dans la méthode uencode, il soit préférable d'utiliser '\ u' à l'exception de '\\ u'. L'essayer.

 /** * Converts character to the mimic unicode format i.e. '\u0020'. * * This format is the Java source code format. * * CharUtils.unicodeEscaped(' ') = "\u0020" * CharUtils.unicodeEscaped('A') = "\u0041" * * @param ch the character to convert * @return is in the mimic of escaped unicode string, */ public static String unicodeEscaped(char ch) { String returnStr; //String uniTemplate = "\u0000"; final static String charEsc = "\u"; if (ch < 0x10) { returnStr = "000" + Integer.toHexString(ch); } else if (ch < 0x100) { returnStr = "00" + Integer.toHexString(ch); } else if (ch < 0x1000) { returnStr = "0" + Integer.toHexString(ch); } else returnStr = "" + Integer.toHexString(ch); return charEsc + returnStr; } /** * Converts the string from UTF8 to mimic unicode format i.e. '\u0020'. * notice: i cannot use real unicode format, because this is immediately translated * to the character in time of compiling and editor (i.e. netbeans) checking it * instead reaal unicode format i.e. '\u0020' i using mimic unicode format '\u0020' * as a string, but it doesn't gives the same results, of course * * This format is the Java source code format. * * CharUtils.unicodeEscaped(' ') = "\u0020" * CharUtils.unicodeEscaped('A') = "\u0041" * * @param String - nationalString in the UTF8 string to convert * @return is the string in Java unicode mimic escaped */ public String encodeStr(String nationalString) throws UnsupportedEncodingException { String convertedString = ""; for (int i = 0; i < nationalString.length(); i++) { Character chs = nationalString.charAt(i); convertedString += unicodeEscaped(chs); } return convertedString; } /** * Converts the string from mimic unicode format i.e. '\u0020' back to UTF8. * * This format is the Java source code format. * * CharUtils.unicodeEscaped(' ') = "\u0020" * CharUtils.unicodeEscaped('A') = "\u0041" * * @param String - nationalString in the Java unicode mimic escaped * @return is the string in UTF8 string */ public String uencodeStr(String escapedString) throws UnsupportedEncodingException { String convertedString = ""; String[] arrStr = escapedString.split("\\u"); String str, istr; for (int i = 1; i < arrStr.length; i++) { str = arrStr[i]; if (!str.isEmpty()) { Integer iI = Integer.parseInt(str, 16); char[] chaCha = Character.toChars(iI); convertedString += String.valueOf(chaCha); } } return convertedString; }