comment remplacer les caractères spéciaux par ceux sur lesquels ils sont basés en PHP?

Question

Comment remplacer:

"ã" avec "a"
"é" avec "e"

en PHP? Est-ce possible? J'ai lu quelque part que je pourrais faire des calculs avec la valeur ascii du caractère de base et la valeur ascii de l'accent, mais je ne trouve aucune référence maintenant.

McPherrinM · Accepted Answer

Cette réponse est incorrecte. Je n'ai pas compris la normalisation Unicode lorsque je l'ai écrit. Regardez le commentaire et le lien de francadaval

Consultez la classe Normalizer pour ce faire. La documentation est bonne, donc je vais juste la lier au lieu de répéter les choses ici:

http://www.php.net/manual/en/class.normalizer.php

Plus précisément, le membre normalize de cette classe:

http://www.php.net/manual/en/normalizer.normalize.php

Notez que la normalisation Unicode a plusieurs formes et que vous semblez vouloir la décomposition de compatibilité avec le formulaire de normalisation KD (NFKD), bien que vous deviez lire la documentation pour vous en assurer.

Vous ne devriez pas essayer de lancer votre propre fonction pour cela: il y a beaucoup trop de choses qui peuvent mal tourner, et utiliser la fonction fournie est une bien meilleure idée.

Alix Axel · Answer

Si vous n'avez pas accès à la classe Normalizer ou si vous ne souhaitez tout simplement pas l'utiliser, vous pouvez utiliser la fonction suivante pour remplacer la plupart (toutes?) Des accentuations courantes.

function Unaccent($string) { return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|Grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8')); }

rcaceres · Answer

Pour ceux qui n'ont pas php 5.3, j'ai trouvé une autre solution qui fonctionne bien et semble très complète. Voici un lien vers le site Web de l'auteur http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001 . Voici la fonction.

/** * Unaccent the input string string. An example string like `ÀØėÿᾜὨζὅБю` * will be translated to `AOeyIOzoBY`. More complete than : * strtr( (string)$str, * "ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ", * "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn" ); * * @param $str input string * @param $utf8 if null, function will detect input string encoding * @author http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001 * @return string input string without accent */ function remove_accents( $str, $utf8=true ) { $str = (string)$str; if( is_null($utf8) ) { if( !function_exists('mb_detect_encoding') ) { $utf8 = (strtolower( mb_detect_encoding($str) )=='utf-8'); } else { $length = strlen($str); $utf8 = true; for ($i=0; $i < $length; $i++) { $c = ord($str[$i]); if ($c < 0x80) $n = 0; # 0bbbbbbb elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b else return false; # Does not match any model for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ? if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) { $utf8 = false; break; } } } } } if(!$utf8) $str = utf8_encode($str); $transliteration = array( 'Ĳ' => 'I', 'Ö' => 'O','Œ' => 'O','Ü' => 'U','ä' => 'a','æ' => 'a', 'ĳ' => 'i','ö' => 'o','œ' => 'o','ü' => 'u','ß' => 's','ſ' => 's', 'À' => 'A','Á' => 'A','Â' => 'A','Ã' => 'A','Ä' => 'A','Å' => 'A', 'Æ' => 'A','Ā' => 'A','Ą' => 'A','Ă' => 'A','Ç' => 'C','Ć' => 'C', 'Č' => 'C','Ĉ' => 'C','Ċ' => 'C','Ď' => 'D','Đ' => 'D','È' => 'E', 'É' => 'E','Ê' => 'E','Ë' => 'E','Ē' => 'E','Ę' => 'E','Ě' => 'E', 'Ĕ' => 'E','Ė' => 'E','Ĝ' => 'G','Ğ' => 'G','Ġ' => 'G','Ģ' => 'G', 'Ĥ' => 'H','Ħ' => 'H','Ì' => 'I','Í' => 'I','Î' => 'I','Ï' => 'I', 'Ī' => 'I','Ĩ' => 'I','Ĭ' => 'I','Į' => 'I','İ' => 'I','Ĵ' => 'J', 'Ķ' => 'K','Ľ' => 'K','Ĺ' => 'K','Ļ' => 'K','Ŀ' => 'K','Ł' => 'L', 'Ñ' => 'N','Ń' => 'N','Ň' => 'N','Ņ' => 'N','Ŋ' => 'N','Ò' => 'O', 'Ó' => 'O','Ô' => 'O','Õ' => 'O','Ø' => 'O','Ō' => 'O','Ő' => 'O', 'Ŏ' => 'O','Ŕ' => 'R','Ř' => 'R','Ŗ' => 'R','Ś' => 'S','Ş' => 'S', 'Ŝ' => 'S','Ș' => 'S','Š' => 'S','Ť' => 'T','Ţ' => 'T','Ŧ' => 'T', 'Ț' => 'T','Ù' => 'U','Ú' => 'U','Û' => 'U','Ū' => 'U','Ů' => 'U', 'Ű' => 'U','Ŭ' => 'U','Ũ' => 'U','Ų' => 'U','Ŵ' => 'W','Ŷ' => 'Y', 'Ÿ' => 'Y','Ý' => 'Y','Ź' => 'Z','Ż' => 'Z','Ž' => 'Z','à' => 'a', 'á' => 'a','â' => 'a','ã' => 'a','ā' => 'a','ą' => 'a','ă' => 'a', 'å' => 'a','ç' => 'c','ć' => 'c','č' => 'c','ĉ' => 'c','ċ' => 'c', 'ď' => 'd','đ' => 'd','è' => 'e','é' => 'e','ê' => 'e','ë' => 'e', 'ē' => 'e','ę' => 'e','ě' => 'e','ĕ' => 'e','ė' => 'e','ƒ' => 'f', 'ĝ' => 'g','ğ' => 'g','ġ' => 'g','ģ' => 'g','ĥ' => 'h','ħ' => 'h', 'ì' => 'i','í' => 'i','î' => 'i','ï' => 'i','ī' => 'i','ĩ' => 'i', 'ĭ' => 'i','į' => 'i','ı' => 'i','ĵ' => 'j','ķ' => 'k','ĸ' => 'k', 'ł' => 'l','ľ' => 'l','ĺ' => 'l','ļ' => 'l','ŀ' => 'l','ñ' => 'n', 'ń' => 'n','ň' => 'n','ņ' => 'n','ŉ' => 'n','ŋ' => 'n','ò' => 'o', 'ó' => 'o','ô' => 'o','õ' => 'o','ø' => 'o','ō' => 'o','ő' => 'o', 'ŏ' => 'o','ŕ' => 'r','ř' => 'r','ŗ' => 'r','ś' => 's','š' => 's', 'ť' => 't','ù' => 'u','ú' => 'u','û' => 'u','ū' => 'u','ů' => 'u', 'ű' => 'u','ŭ' => 'u','ũ' => 'u','ų' => 'u','ŵ' => 'w','ÿ' => 'y', 'ý' => 'y','ŷ' => 'y','ż' => 'z','ź' => 'z','ž' => 'z','Α' => 'A', 'Ά' => 'A','Ἀ' => 'A','Ἁ' => 'A','Ἂ' => 'A','Ἃ' => 'A','Ἄ' => 'A', 'Ἅ' => 'A','Ἆ' => 'A','Ἇ' => 'A','ᾈ' => 'A','ᾉ' => 'A','ᾊ' => 'A', 'ᾋ' => 'A','ᾌ' => 'A','ᾍ' => 'A','ᾎ' => 'A','ᾏ' => 'A','Ᾰ' => 'A', 'Ᾱ' => 'A','Ὰ' => 'A','ᾼ' => 'A','Β' => 'B','Γ' => 'G','Δ' => 'D', 'Ε' => 'E','Έ' => 'E','Ἐ' => 'E','Ἑ' => 'E','Ἒ' => 'E','Ἓ' => 'E', 'Ἔ' => 'E','Ἕ' => 'E','Ὲ' => 'E','Ζ' => 'Z','Η' => 'I','Ή' => 'I', 'Ἠ' => 'I','Ἡ' => 'I','Ἢ' => 'I','Ἣ' => 'I','Ἤ' => 'I','Ἥ' => 'I', 'Ἦ' => 'I','Ἧ' => 'I','ᾘ' => 'I','ᾙ' => 'I','ᾚ' => 'I','ᾛ' => 'I', 'ᾜ' => 'I','ᾝ' => 'I','ᾞ' => 'I','ᾟ' => 'I','Ὴ' => 'I','ῌ' => 'I', 'Θ' => 'T','Ι' => 'I','Ί' => 'I','Ϊ' => 'I','Ἰ' => 'I','Ἱ' => 'I', 'Ἲ' => 'I','Ἳ' => 'I','Ἴ' => 'I','Ἵ' => 'I','Ἶ' => 'I','Ἷ' => 'I', 'Ῐ' => 'I','Ῑ' => 'I','Ὶ' => 'I','Κ' => 'K','Λ' => 'L','Μ' => 'M', 'Ν' => 'N','Ξ' => 'K','Ο' => 'O','Ό' => 'O','Ὀ' => 'O','Ὁ' => 'O', 'Ὂ' => 'O','Ὃ' => 'O','Ὄ' => 'O','Ὅ' => 'O','Ὸ' => 'O','Π' => 'P', 'Ρ' => 'R','Ῥ' => 'R','Σ' => 'S','Τ' => 'T','Υ' => 'Y','Ύ' => 'Y', 'Ϋ' => 'Y','Ὑ' => 'Y','Ὓ' => 'Y','Ὕ' => 'Y','Ὗ' => 'Y','Ῠ' => 'Y', 'Ῡ' => 'Y','Ὺ' => 'Y','Φ' => 'F','Χ' => 'X','Ψ' => 'P','Ω' => 'O', 'Ώ' => 'O','Ὠ' => 'O','Ὡ' => 'O','Ὢ' => 'O','Ὣ' => 'O','Ὤ' => 'O', 'Ὥ' => 'O','Ὦ' => 'O','Ὧ' => 'O','ᾨ' => 'O','ᾩ' => 'O','ᾪ' => 'O', 'ᾫ' => 'O','ᾬ' => 'O','ᾭ' => 'O','ᾮ' => 'O','ᾯ' => 'O','Ὼ' => 'O', 'ῼ' => 'O','α' => 'a','ά' => 'a','ἀ' => 'a','ἁ' => 'a','ἂ' => 'a', 'ἃ' => 'a','ἄ' => 'a','ἅ' => 'a','ἆ' => 'a','ἇ' => 'a','ᾀ' => 'a', 'ᾁ' => 'a','ᾂ' => 'a','ᾃ' => 'a','ᾄ' => 'a','ᾅ' => 'a','ᾆ' => 'a', 'ᾇ' => 'a','ὰ' => 'a','ᾰ' => 'a','ᾱ' => 'a','ᾲ' => 'a','ᾳ' => 'a', 'ᾴ' => 'a','ᾶ' => 'a','ᾷ' => 'a','β' => 'b','γ' => 'g','δ' => 'd', 'ε' => 'e','έ' => 'e','ἐ' => 'e','ἑ' => 'e','ἒ' => 'e','ἓ' => 'e', 'ἔ' => 'e','ἕ' => 'e','ὲ' => 'e','ζ' => 'z','η' => 'i','ή' => 'i', 'ἠ' => 'i','ἡ' => 'i','ἢ' => 'i','ἣ' => 'i','ἤ' => 'i','ἥ' => 'i', 'ἦ' => 'i','ἧ' => 'i','ᾐ' => 'i','ᾑ' => 'i','ᾒ' => 'i','ᾓ' => 'i', 'ᾔ' => 'i','ᾕ' => 'i','ᾖ' => 'i','ᾗ' => 'i','ὴ' => 'i','ῂ' => 'i', 'ῃ' => 'i','ῄ' => 'i','ῆ' => 'i','ῇ' => 'i','θ' => 't','ι' => 'i', 'ί' => 'i','ϊ' => 'i','ΐ' => 'i','ἰ' => 'i','ἱ' => 'i','ἲ' => 'i', 'ἳ' => 'i','ἴ' => 'i','ἵ' => 'i','ἶ' => 'i','ἷ' => 'i','ὶ' => 'i', 'ῐ' => 'i','ῑ' => 'i','ῒ' => 'i','ῖ' => 'i','ῗ' => 'i','κ' => 'k', 'λ' => 'l','μ' => 'm','ν' => 'n','ξ' => 'k','ο' => 'o','ό' => 'o', 'ὀ' => 'o','ὁ' => 'o','ὂ' => 'o','ὃ' => 'o','ὄ' => 'o','ὅ' => 'o', 'ὸ' => 'o','π' => 'p','ρ' => 'r','ῤ' => 'r','ῥ' => 'r','σ' => 's', 'ς' => 's','τ' => 't','υ' => 'y','ύ' => 'y','ϋ' => 'y','ΰ' => 'y', 'ὐ' => 'y','ὑ' => 'y','ὒ' => 'y','ὓ' => 'y','ὔ' => 'y','ὕ' => 'y', 'ὖ' => 'y','ὗ' => 'y','ὺ' => 'y','ῠ' => 'y','ῡ' => 'y','ῢ' => 'y', 'ῦ' => 'y','ῧ' => 'y','φ' => 'f','χ' => 'x','ψ' => 'p','ω' => 'o', 'ώ' => 'o','ὠ' => 'o','ὡ' => 'o','ὢ' => 'o','ὣ' => 'o','ὤ' => 'o', 'ὥ' => 'o','ὦ' => 'o','ὧ' => 'o','ᾠ' => 'o','ᾡ' => 'o','ᾢ' => 'o', 'ᾣ' => 'o','ᾤ' => 'o','ᾥ' => 'o','ᾦ' => 'o','ᾧ' => 'o','ὼ' => 'o', 'ῲ' => 'o','ῳ' => 'o','ῴ' => 'o','ῶ' => 'o','ῷ' => 'o','А' => 'A', 'Б' => 'B','В' => 'V','Г' => 'G','Д' => 'D','Е' => 'E','Ё' => 'E', 'Ж' => 'Z','З' => 'Z','И' => 'I','Й' => 'I','К' => 'K','Л' => 'L', 'М' => 'M','Н' => 'N','О' => 'O','П' => 'P','Р' => 'R','С' => 'S', 'Т' => 'T','У' => 'U','Ф' => 'F','Х' => 'K','Ц' => 'T','Ч' => 'C', 'Ш' => 'S','Щ' => 'S','Ы' => 'Y','Э' => 'E','Ю' => 'Y','Я' => 'Y', 'а' => 'A','б' => 'B','в' => 'V','г' => 'G','д' => 'D','е' => 'E', 'ё' => 'E','ж' => 'Z','з' => 'Z','и' => 'I','й' => 'I','к' => 'K', 'л' => 'L','м' => 'M','н' => 'N','о' => 'O','п' => 'P','р' => 'R', 'с' => 'S','т' => 'T','у' => 'U','ф' => 'F','х' => 'K','ц' => 'T', 'ч' => 'C','ш' => 'S','щ' => 'S','ы' => 'Y','э' => 'E','ю' => 'Y', 'я' => 'Y','ð' => 'd','Ð' => 'D','þ' => 't','Þ' => 'T','ა' => 'a', 'ბ' => 'b','გ' => 'g','დ' => 'd','ე' => 'e','ვ' => 'v','ზ' => 'z', 'თ' => 't','ი' => 'i','კ' => 'k','ლ' => 'l','მ' => 'm','ნ' => 'n', 'ო' => 'o','პ' => 'p','ჟ' => 'z','რ' => 'r','ს' => 's','ტ' => 't', 'უ' => 'u','ფ' => 'p','ქ' => 'k','ღ' => 'g','ყ' => 'q','შ' => 's', 'ჩ' => 'c','ც' => 't','ძ' => 'd','წ' => 't','ჭ' => 'c','ხ' => 'k', 'ჯ' => 'j','ჰ' => 'h' ); $str = str_replace( array_keys( $transliteration ), array_values( $transliteration ), $str); return $str; } //- remove_accents()

quantme · Answer

Short str_replace utiliser avec des caractères personnalisés:

<?php $original_string = "¿Dónde está el niño que vive aquí? En el témpano o en el iglú. ÁFRICA, MÉXICO, ÍNDICE, CANCIÓN y NÚMERO."; $some_special_chars = array("á", "é", "í", "ó", "ú", "Á", "É", "Í", "Ó", "Ú", "ñ", "Ñ"); $replacement_chars = array("a", "e", "i", "o", "u", "A", "E", "I", "O", "U", "n", "N"); $replaced_string = str_replace($some_special_chars, $replacement_chars, $original_string); echo $replaced_string; // outputs '¿Donde esta el nino que vive aqui? En el tempano o en el iglu. AFRICA, MEXICO, INDICE, CANCION y NUMERO.' ?>

eleg · Answer

utiliser PEAR I18N_UnicodeNormalizer-1.0.

include('…'); echo preg_replace( '/(\P{L})/ui', // replace all except members of Unicode class "letters", case insensitive '', // with nothing I18N_UnicodeNormalizer::toNFKD('ÅÉÏÔÙåéïôù') // ù → u + ` );

→ AEIOUaeiou

lethargy · Answer

Si aucune des autres solutions ne fonctionne pour vous, voici ce qui a fonctionné pour moi:

<?php $string = "áéíóúÁ—whatever"; // create an array of the hex codes of the characters you want to replace (formatted as shown) and whatever you want to replace them with. $characters = array( "[\xF3]" => "&ocacute;", //ó "[\xFC]" => "&uuml;", //ü "[\xF1]" => "&ntilde;", //ñ "[\xEB]" => "&euml;", //ë "[\xE9]" => "&eacute;", //é "[\xBD]" => "&frac12;", //½ ); // note that you must use a two-digit hex code for whatever reason. // So, for example, although the hex code for an em dash is 2014, you have to use 97 instead. ("[\x97]" => "&mdash;") // separate the key->value array into two separate arrays. Or just make two arrays from the beginning, but it's easier to read this way. foreach ($characters as $hex => $html) { $replaceThis[] = $hex; $replaceWith[] = $html; } $string = preg_replace($replaceThis, $replaceWith, $string); ?>

Ce n'est peut-être pas la solution la plus élégante, mais elle fonctionne et ne nécessite aucune connaissance des expressions régulières.

question_about_the_problem · Answer

Surtout lorsque vous comparez des textes entre eux ou avec des mots-clés, il est utile de normaliser les textes avant. La fonction suivante supprime tous les signes diacritiques (marques comme des accents) d'un texte codé UTF8 donné et renvoie du texte ASCii.

Assurez-vous d'avoir installé l'extension PHP-Normalizer (intl et icu).

Conseil: vous pouvez également mapper le texte en minuscules avant d'exécuter les procédures de correspondance ...

<?php function normalizeUtf8String( $s) { // Normalizer-class missing! if (! class_exists("Normalizer", $autoload = false)) return $original_string; // maps German (umlauts) and other European characters onto two characters before just removing diacritics $s = preg_replace( '@\x{00c4}@u' , "AE", $s ); // umlaut Ä => AE $s = preg_replace( '@\x{00d6}@u' , "OE", $s ); // umlaut Ö => OE $s = preg_replace( '@\x{00dc}@u' , "UE", $s ); // umlaut Ü => UE $s = preg_replace( '@\x{00e4}@u' , "ae", $s ); // umlaut ä => ae $s = preg_replace( '@\x{00f6}@u' , "oe", $s ); // umlaut ö => oe $s = preg_replace( '@\x{00fc}@u' , "ue", $s ); // umlaut ü => ue $s = preg_replace( '@\x{00f1}@u' , "ny", $s ); // ñ => ny $s = preg_replace( '@\x{00ff}@u' , "yu", $s ); // ÿ => yu // maps special characters (characters with diacritics) on their base-character followed by the diacritical mark // exmaple: Ú => U´, á => a` $s = Normalizer::normalize( $s, Normalizer::FORM_D ); $s = preg_replace( '@\pM@u' , "", $s ); // removes diacritics $s = preg_replace( '@\x{00df}@u' , "ss", $s ); // maps German ß onto ss $s = preg_replace( '@\x{00c6}@u' , "AE", $s ); // Æ => AE $s = preg_replace( '@\x{00e6}@u' , "ae", $s ); // æ => ae $s = preg_replace( '@\x{0132}@u' , "IJ", $s ); // ? => IJ $s = preg_replace( '@\x{0133}@u' , "ij", $s ); // ? => ij $s = preg_replace( '@\x{0152}@u' , "OE", $s ); // Œ => OE $s = preg_replace( '@\x{0153}@u' , "oe", $s ); // œ => oe $s = preg_replace( '@\x{00d0}@u' , "D", $s ); // Ð => D $s = preg_replace( '@\x{0110}@u' , "D", $s ); // Ð => D $s = preg_replace( '@\x{00f0}@u' , "d", $s ); // ð => d $s = preg_replace( '@\x{0111}@u' , "d", $s ); // d => d $s = preg_replace( '@\x{0126}@u' , "H", $s ); // H => H $s = preg_replace( '@\x{0127}@u' , "h", $s ); // h => h $s = preg_replace( '@\x{0131}@u' , "i", $s ); // i => i $s = preg_replace( '@\x{0138}@u' , "k", $s ); // ? => k $s = preg_replace( '@\x{013f}@u' , "L", $s ); // ? => L $s = preg_replace( '@\x{0141}@u' , "L", $s ); // L => L $s = preg_replace( '@\x{0140}@u' , "l", $s ); // ? => l $s = preg_replace( '@\x{0142}@u' , "l", $s ); // l => l $s = preg_replace( '@\x{014a}@u' , "N", $s ); // ? => N $s = preg_replace( '@\x{0149}@u' , "n", $s ); // ? => n $s = preg_replace( '@\x{014b}@u' , "n", $s ); // ? => n $s = preg_replace( '@\x{00d8}@u' , "O", $s ); // Ø => O $s = preg_replace( '@\x{00f8}@u' , "o", $s ); // ø => o $s = preg_replace( '@\x{017f}@u' , "s", $s ); // ? => s $s = preg_replace( '@\x{00de}@u' , "T", $s ); // Þ => T $s = preg_replace( '@\x{0166}@u' , "T", $s ); // T => T $s = preg_replace( '@\x{00fe}@u' , "t", $s ); // þ => t $s = preg_replace( '@\x{0167}@u' , "t", $s ); // t => t // remove all non-ASCii characters $s = preg_replace( '@[^\0-\x80]@u' , "", $s ); // possible errors in UTF8-regular-expressions if (empty($s)) return $original_string; else return $s; } ?>

La fonction ci-dessus est principalement basée sur l'article suivant: http://ahinea.com/en/tech/accented-translate.html

Pascal MARTIN · Answer

Les gens utilisent souvent str_replace ou strtr et une grande liste de caractères à convertir "de" et "en" - même si cela n'a pas l'air assez joli ...

Une autre solution, je suppose, pourrait être d'utiliser quelque chose comme iconv avec l'option //TRANSLIT - mais ne fonctionne pas toujours, d'après ce dont je me souviens ...

De plus, si vous utilisez PHP 5.3, la nouvelle classe Normalizer pourrait être intéressante ;-)