web-dev-qa-db-fra.com

Remplacement des caractères accentués php

J'essaie de remplacer les caractères accentués par les remplacements normaux. Voici ce que je suis en train de faire.

    $string = "Éric Cantona";
    $strict = strtolower($string);

    echo "After Lower: ".$strict;

    $patterns[0] = '/[á|â|à|å|ä]/';
    $patterns[1] = '/[ð|é|ê|è|ë]/';
    $patterns[2] = '/[í|î|ì|ï]/';
    $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/';
    $patterns[4] = '/[ú|û|ù|ü]/';
    $patterns[5] = '/æ/';
    $patterns[6] = '/ç/';
    $patterns[7] = '/ß/';
    $replacements[0] = 'a';
    $replacements[1] = 'e';
    $replacements[2] = 'i';
    $replacements[3] = 'o';
    $replacements[4] = 'u';
    $replacements[5] = 'ae';
    $replacements[6] = 'c';
    $replacements[7] = 'ss';

    $strict = preg_replace($patterns, $replacements, $strict);
    echo "Final: ".$strict;

Cela me donne:

    After Lower: éric cantona
    Final: ric cantona

Ce qui précède me donne ric cantona Je veux que la sortie soit eric cantona.

quelqu'un peut-il m'aider avec où je vais mal?

69
Lizard

J'ai essayé toutes les sortes en fonction des variantes énumérées dans les réponses, mais les suivantes ont fonctionné:

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
                            'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
                            'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
                            'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
                            'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );
149
Lizard

Pour supprimer les signes diacritiques, utilisez iconv:

$val = iconv('ISO-8859-1','ASCII//TRANSLIT',$val);

ou

$val = iconv('UTF-8','ASCII//TRANSLIT',$val);

notez que php a un bogue étrange dans le sens où il doit (parfois?) avoir un environnement local défini pour que ces conversions fonctionnent, en utilisant setlocale ().

edit juste testé, il récupère la plupart de vos signes diacritiques:

$val = "á|â|à|å|ä ð|é|ê|è|ë í|î|ì|ï ó|ô|ò|ø|õ|ö ú|û|ù|ü æ ç ß abc ABC 123";
echo iconv('UTF-8','ASCII//TRANSLIT',$val); 

sortie:

a|a|a|a|a ?|e|e|e|e i|i|i|i o|o|o|?|o|o u|u|u|u ae c ss abc ABC 123

Donc, vous voudrez peut-être corriger ces deux problèmes étranges avant d'appeler iconv, ou vous plonger dans les rouages ​​de PHP et les réparer.

73
mvds

Je viens de trouver la réponse de Lizard qui est extrêmement utile - en particulier lorsque vous effectuez un tri. N'est-ce pas beau combien de caractères nous devons dire la même chose;)

Si quelqu'un d'autre recherche une solution complète (d'après les commentaires ci-dessus), voici le copier-coller:

/**
 * Replace language-specific characters by ASCII-equivalents.
 * @param string $s
 * @return string
 */
public static function normalizeChars($s) {
    $replace = array(
        'ъ'=>'-', 'Ь'=>'-', 'Ъ'=>'-', 'ь'=>'-',
        'Ă'=>'A', 'Ą'=>'A', 'À'=>'A', 'Ã'=>'A', 'Á'=>'A', 'Æ'=>'A', 'Â'=>'A', 'Å'=>'A', 'Ä'=>'Ae',
        'Þ'=>'B',
        'Ć'=>'C', 'ץ'=>'C', 'Ç'=>'C',
        'È'=>'E', 'Ę'=>'E', 'É'=>'E', 'Ë'=>'E', 'Ê'=>'E',
        'Ğ'=>'G',
        'İ'=>'I', 'Ï'=>'I', 'Î'=>'I', 'Í'=>'I', 'Ì'=>'I',
        'Ł'=>'L',
        'Ñ'=>'N', 'Ń'=>'N',
        'Ø'=>'O', 'Ó'=>'O', 'Ò'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'Oe',
        'Ş'=>'S', 'Ś'=>'S', 'Ș'=>'S', 'Š'=>'S',
        'Ț'=>'T',
        'Ù'=>'U', 'Û'=>'U', 'Ú'=>'U', 'Ü'=>'Ue',
        'Ý'=>'Y',
        'Ź'=>'Z', 'Ž'=>'Z', 'Ż'=>'Z',
        'â'=>'a', 'ǎ'=>'a', 'ą'=>'a', 'á'=>'a', 'ă'=>'a', 'ã'=>'a', 'Ǎ'=>'a', 'а'=>'a', 'А'=>'a', 'å'=>'a', 'à'=>'a', 'א'=>'a', 'Ǻ'=>'a', 'Ā'=>'a', 'ǻ'=>'a', 'ā'=>'a', 'ä'=>'ae', 'æ'=>'ae', 'Ǽ'=>'ae', 'ǽ'=>'ae',
        'б'=>'b', 'ב'=>'b', 'Б'=>'b', 'þ'=>'b',
        'ĉ'=>'c', 'Ĉ'=>'c', 'Ċ'=>'c', 'ć'=>'c', 'ç'=>'c', 'ц'=>'c', 'צ'=>'c', 'ċ'=>'c', 'Ц'=>'c', 'Č'=>'c', 'č'=>'c', 'Ч'=>'ch', 'ч'=>'ch',
        'ד'=>'d', 'ď'=>'d', 'Đ'=>'d', 'Ď'=>'d', 'đ'=>'d', 'д'=>'d', 'Д'=>'D', 'ð'=>'d',
        'є'=>'e', 'ע'=>'e', 'е'=>'e', 'Е'=>'e', 'Ə'=>'e', 'ę'=>'e', 'ĕ'=>'e', 'ē'=>'e', 'Ē'=>'e', 'Ė'=>'e', 'ė'=>'e', 'ě'=>'e', 'Ě'=>'e', 'Є'=>'e', 'Ĕ'=>'e', 'ê'=>'e', 'ə'=>'e', 'è'=>'e', 'ë'=>'e', 'é'=>'e',
        'ф'=>'f', 'ƒ'=>'f', 'Ф'=>'f',
        'ġ'=>'g', 'Ģ'=>'g', 'Ġ'=>'g', 'Ĝ'=>'g', 'Г'=>'g', 'г'=>'g', 'ĝ'=>'g', 'ğ'=>'g', 'ג'=>'g', 'Ґ'=>'g', 'ґ'=>'g', 'ģ'=>'g',
        'ח'=>'h', 'ħ'=>'h', 'Х'=>'h', 'Ħ'=>'h', 'Ĥ'=>'h', 'ĥ'=>'h', 'х'=>'h', 'ה'=>'h',
        'î'=>'i', 'ï'=>'i', 'í'=>'i', 'ì'=>'i', 'į'=>'i', 'ĭ'=>'i', 'ı'=>'i', 'Ĭ'=>'i', 'И'=>'i', 'ĩ'=>'i', 'ǐ'=>'i', 'Ĩ'=>'i', 'Ǐ'=>'i', 'и'=>'i', 'Į'=>'i', 'י'=>'i', 'Ї'=>'i', 'Ī'=>'i', 'І'=>'i', 'ї'=>'i', 'і'=>'i', 'ī'=>'i', 'ij'=>'ij', 'IJ'=>'ij',
        'й'=>'j', 'Й'=>'j', 'Ĵ'=>'j', 'ĵ'=>'j', 'я'=>'ja', 'Я'=>'ja', 'Э'=>'je', 'э'=>'je', 'ё'=>'jo', 'Ё'=>'jo', 'ю'=>'ju', 'Ю'=>'ju',
        'ĸ'=>'k', 'כ'=>'k', 'Ķ'=>'k', 'К'=>'k', 'к'=>'k', 'ķ'=>'k', 'ך'=>'k',
        'Ŀ'=>'l', 'ŀ'=>'l', 'Л'=>'l', 'ł'=>'l', 'ļ'=>'l', 'ĺ'=>'l', 'Ĺ'=>'l', 'Ļ'=>'l', 'л'=>'l', 'Ľ'=>'l', 'ľ'=>'l', 'ל'=>'l',
        'מ'=>'m', 'М'=>'m', 'ם'=>'m', 'м'=>'m',
        'ñ'=>'n', 'н'=>'n', 'Ņ'=>'n', 'ן'=>'n', 'ŋ'=>'n', 'נ'=>'n', 'Н'=>'n', 'ń'=>'n', 'Ŋ'=>'n', 'ņ'=>'n', 'ʼn'=>'n', 'Ň'=>'n', 'ň'=>'n',
        'о'=>'o', 'О'=>'o', 'ő'=>'o', 'õ'=>'o', 'ô'=>'o', 'Ő'=>'o', 'ŏ'=>'o', 'Ŏ'=>'o', 'Ō'=>'o', 'ō'=>'o', 'ø'=>'o', 'ǿ'=>'o', 'ǒ'=>'o', 'ò'=>'o', 'Ǿ'=>'o', 'Ǒ'=>'o', 'ơ'=>'o', 'ó'=>'o', 'Ơ'=>'o', 'œ'=>'oe', 'Œ'=>'oe', 'ö'=>'oe',
        'פ'=>'p', 'ף'=>'p', 'п'=>'p', 'П'=>'p',
        'ק'=>'q',
        'ŕ'=>'r', 'ř'=>'r', 'Ř'=>'r', 'ŗ'=>'r', 'Ŗ'=>'r', 'ר'=>'r', 'Ŕ'=>'r', 'Р'=>'r', 'р'=>'r',
        'ș'=>'s', 'с'=>'s', 'Ŝ'=>'s', 'š'=>'s', 'ś'=>'s', 'ס'=>'s', 'ş'=>'s', 'С'=>'s', 'ŝ'=>'s', 'Щ'=>'sch', 'щ'=>'sch', 'ш'=>'sh', 'Ш'=>'sh', 'ß'=>'ss',
        'т'=>'t', 'ט'=>'t', 'ŧ'=>'t', 'ת'=>'t', 'ť'=>'t', 'ţ'=>'t', 'Ţ'=>'t', 'Т'=>'t', 'ț'=>'t', 'Ŧ'=>'t', 'Ť'=>'t', '™'=>'tm',
        'ū'=>'u', 'у'=>'u', 'Ũ'=>'u', 'ũ'=>'u', 'Ư'=>'u', 'ư'=>'u', 'Ū'=>'u', 'Ǔ'=>'u', 'ų'=>'u', 'Ų'=>'u', 'ŭ'=>'u', 'Ŭ'=>'u', 'Ů'=>'u', 'ů'=>'u', 'ű'=>'u', 'Ű'=>'u', 'Ǖ'=>'u', 'ǔ'=>'u', 'Ǜ'=>'u', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'У'=>'u', 'ǚ'=>'u', 'ǜ'=>'u', 'Ǚ'=>'u', 'Ǘ'=>'u', 'ǖ'=>'u', 'ǘ'=>'u', 'ü'=>'ue',
        'в'=>'v', 'ו'=>'v', 'В'=>'v',
        'ש'=>'w', 'ŵ'=>'w', 'Ŵ'=>'w',
        'ы'=>'y', 'ŷ'=>'y', 'ý'=>'y', 'ÿ'=>'y', 'Ÿ'=>'y', 'Ŷ'=>'y',
        'Ы'=>'y', 'ž'=>'z', 'З'=>'z', 'з'=>'z', 'ź'=>'z', 'ז'=>'z', 'ż'=>'z', 'ſ'=>'z', 'Ж'=>'zh', 'ж'=>'zh'
    );
    return strtr($s, $replace);
}

Notez quelques légères modifications concernant les trémas allemands (ä => ae)

Edit: Inclus plus de caractères en fonction de la publication de l'utilisateur3682119 (à l'exception du symbole de copyright) et du commentaire de daker.

38
BurninLeo

Dans PHP 5.4 l'extension intl fournit une nouvelle classe appelée Transliterator.

Je crois que c'est la meilleure façon de supprimer les signes diacritiques pour deux raisons:

  1. Transliterator est basé sur ICU, vous utilisez donc les tableaux de la bibliothèque ICU. ICU est un excellent projet développé au cours de l'année pour fournir des tableaux complets Quelles que soient les tables que vous souhaitiez écrire vous-même, elles ne seront jamais aussi complètes que celles de ICU.

  2. En UTF-8, les caractères peuvent être représentés différemment. Par exemple, le caractère ñ peut être enregistré en tant que caractère unique (multi-octets) ou en tant que combinaison de caractères ˜ (multi-octets) et n. En outre, certains caractères Unicode sont homographes: ils se ressemblent tout en ayant des points de code différents. Pour cette raison, il est également important de normaliser la chaîne.

Voici un exemple de code, tiré de ne de mes anciennes réponses :

<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
    $normalized = $transliterator->transliterate($e);
    echo $e. ' --> '.$normalized."\n";
}
?>

Résultat:

abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto

Le premier argument de la classe Transliterator supprime les signes diacritiques ainsi que la normalisation de la chaîne.

25
ItalyPaleAle

Cela a fonctionné pour moi:

<?php
setlocale(LC_ALL, "en_US.utf8"); 
$val = iconv('UTF-8','ASCII//TRANSLIT',$val);
?>
9
Stergios Zg.

J'ai donc trouvé ceci sur la page php.net pour la fonction preg_replace

// replace accented chars

$string = "Zacarías Ferreíra"; // my definition for string variable
$accents = '/&([A-Za-z]{1,2})(Grave|acute|circ|cedil|uml|lig);/';

$string_encoded = htmlentities($string,ENT_NOQUOTES,'UTF-8');

$string = preg_replace($accents,'$1',$string_encoded);

Si vous avez des problèmes d’encodage, vous risquez d’obtenir quelque chose comme "Zacarès Ferreira", il suffit de décoder la chaîne et d’utiliser le code ci-dessus.

$string = utf8_decode("Zacarías Ferreíra");
9
Kasey Thomas
protected $_convertTable = array(
    '&amp;' => 'and',   '@' => 'at',    '©' => 'c', '®' => 'r', 'À' => 'a',
    'Á' => 'a', 'Â' => 'a', 'Ä' => 'a', 'Å' => 'a', 'Æ' => 'ae','Ç' => 'c',
    'È' => 'e', 'É' => 'e', 'Ë' => 'e', 'Ì' => 'i', 'Í' => 'i', 'Î' => 'i',
    'Ï' => 'i', 'Ò' => 'o', 'Ó' => 'o', 'Ô' => 'o', 'Õ' => 'o', 'Ö' => 'o',
    'Ø' => 'o', 'Ù' => 'u', 'Ú' => 'u', 'Û' => 'u', 'Ü' => 'u', 'Ý' => 'y',
    'ß' => 'ss','à' => 'a', 'á' => 'a', 'â' => 'a', 'ä' => 'a', 'å' => 'a',
    'æ' => 'ae','ç' => 'c', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e',
    'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ò' => 'o', 'ó' => 'o',
    'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u',
    'û' => 'u', 'ü' => 'u', 'ý' => 'y', 'þ' => 'p', 'ÿ' => 'y', 'Ā' => 'a',
    'ā' => 'a', 'Ă' => 'a', 'ă' => 'a', 'Ą' => 'a', 'ą' => 'a', 'Ć' => 'c',
    'ć' => 'c', 'Ĉ' => 'c', 'ĉ' => 'c', 'Ċ' => 'c', 'ċ' => 'c', 'Č' => 'c',
    'č' => 'c', 'Ď' => 'd', 'ď' => 'd', 'Đ' => 'd', 'đ' => 'd', 'Ē' => 'e',
    'ē' => 'e', 'Ĕ' => 'e', 'ĕ' => 'e', 'Ė' => 'e', 'ė' => 'e', 'Ę' => 'e',
    'ę' => 'e', 'Ě' => 'e', 'ě' => 'e', 'Ĝ' => 'g', 'ĝ' => 'g', 'Ğ' => 'g',
    'ğ' => 'g', 'Ġ' => 'g', 'ġ' => 'g', 'Ģ' => 'g', 'ģ' => 'g', 'Ĥ' => 'h',
    'ĥ' => 'h', 'Ħ' => 'h', 'ħ' => 'h', 'Ĩ' => 'i', 'ĩ' => 'i', 'Ī' => 'i',
    'ī' => 'i', 'Ĭ' => 'i', 'ĭ' => 'i', 'Į' => 'i', 'į' => 'i', 'İ' => 'i',
    'ı' => 'i', 'IJ' => 'ij','ij' => 'ij','Ĵ' => 'j', 'ĵ' => 'j', 'Ķ' => 'k',
    'ķ' => 'k', 'ĸ' => 'k', 'Ĺ' => 'l', 'ĺ' => 'l', 'Ļ' => 'l', 'ļ' => 'l',
    'Ľ' => 'l', 'ľ' => 'l', 'Ŀ' => 'l', 'ŀ' => 'l', 'Ł' => 'l', 'ł' => 'l',
    'Ń' => 'n', 'ń' => 'n', 'Ņ' => 'n', 'ņ' => 'n', 'Ň' => 'n', 'ň' => 'n',
    'ʼn' => 'n', 'Ŋ' => 'n', 'ŋ' => 'n', 'Ō' => 'o', 'ō' => 'o', 'Ŏ' => 'o',
    'ŏ' => 'o', 'Ő' => 'o', 'ő' => 'o', 'Œ' => 'oe','œ' => 'oe','Ŕ' => 'r',
    'ŕ' => 'r', 'Ŗ' => 'r', 'ŗ' => 'r', 'Ř' => 'r', 'ř' => 'r', 'Ś' => 's',
    'ś' => 's', 'Ŝ' => 's', 'ŝ' => 's', 'Ş' => 's', 'ş' => 's', 'Š' => 's',
    'š' => 's', 'Ţ' => 't', 'ţ' => 't', 'Ť' => 't', 'ť' => 't', 'Ŧ' => 't',
    'ŧ' => 't', 'Ũ' => 'u', 'ũ' => 'u', 'Ū' => 'u', 'ū' => 'u', 'Ŭ' => 'u',
    'ŭ' => 'u', 'Ů' => 'u', 'ů' => 'u', 'Ű' => 'u', 'ű' => 'u', 'Ų' => 'u',
    'ų' => 'u', 'Ŵ' => 'w', 'ŵ' => 'w', 'Ŷ' => 'y', 'ŷ' => 'y', 'Ÿ' => 'y',
    'Ź' => 'z', 'ź' => 'z', 'Ż' => 'z', 'ż' => 'z', 'Ž' => 'z', 'ž' => 'z',
    'ſ' => 'z', 'Ə' => 'e', 'ƒ' => 'f', 'Ơ' => 'o', 'ơ' => 'o', 'Ư' => 'u',
    'ư' => 'u', 'Ǎ' => 'a', 'ǎ' => 'a', 'Ǐ' => 'i', 'ǐ' => 'i', 'Ǒ' => 'o',
    'ǒ' => 'o', 'Ǔ' => 'u', 'ǔ' => 'u', 'Ǖ' => 'u', 'ǖ' => 'u', 'Ǘ' => 'u',
    'ǘ' => 'u', 'Ǚ' => 'u', 'ǚ' => 'u', 'Ǜ' => 'u', 'ǜ' => 'u', 'Ǻ' => 'a',
    'ǻ' => 'a', 'Ǽ' => 'ae','ǽ' => 'ae','Ǿ' => 'o', 'ǿ' => 'o', 'ə' => 'e',
    'Ё' => 'jo','Є' => 'e', 'І' => 'i', 'Ї' => 'i', 'А' => 'a', 'Б' => 'b',
    'В' => 'v', 'Г' => 'g', 'Д' => 'd', 'Е' => 'e', 'Ж' => 'zh','З' => 'z',
    'И' => 'i', 'Й' => 'j', 'К' => 'k', 'Л' => 'l', 'М' => 'm', 'Н' => 'n',
    'О' => 'o', 'П' => 'p', 'Р' => 'r', 'С' => 's', 'Т' => 't', 'У' => 'u',
    'Ф' => 'f', 'Х' => 'h', 'Ц' => 'c', 'Ч' => 'ch','Ш' => 'sh','Щ' => 'sch',
    'Ъ' => '-', 'Ы' => 'y', 'Ь' => '-', 'Э' => 'je','Ю' => 'ju','Я' => 'ja',
    'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e',
    'ж' => 'zh','з' => 'z', 'и' => 'i', 'й' => 'j', 'к' => 'k', 'л' => 'l',
    'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's',
    'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch',
    'ш' => 'sh','щ' => 'sch','ъ' => '-','ы' => 'y', 'ь' => '-', 'э' => 'je',
    'ю' => 'ju','я' => 'ja','ё' => 'jo','є' => 'e', 'і' => 'i', 'ї' => 'i',
    'Ґ' => 'g', 'ґ' => 'g', 'א' => 'a', 'ב' => 'b', 'ג' => 'g', 'ד' => 'd',
    'ה' => 'h', 'ו' => 'v', 'ז' => 'z', 'ח' => 'h', 'ט' => 't', 'י' => 'i',
    'ך' => 'k', 'כ' => 'k', 'ל' => 'l', 'ם' => 'm', 'מ' => 'm', 'ן' => 'n',
    'נ' => 'n', 'ס' => 's', 'ע' => 'e', 'ף' => 'p', 'פ' => 'p', 'ץ' => 'C',
    'צ' => 'c', 'ק' => 'q', 'ר' => 'r', 'ש' => 'w', 'ת' => 't', '™' => 'tm',
);

De magento, je l’utilise pour pratiquement tout

8
user3682119

Une réponse actualisée basée sur la réponse de @ BurninLeo

function replace_spec_char($subject) {
    $char_map = array(
        "ъ" => "-", "ь" => "-", "Ъ" => "-", "Ь" => "-",
        "А" => "A", "Ă" => "A", "Ǎ" => "A", "Ą" => "A", "À" => "A", "Ã" => "A", "Á" => "A", "Æ" => "A", "Â" => "A", "Å" => "A", "Ǻ" => "A", "Ā" => "A", "א" => "A",
        "Б" => "B", "ב" => "B", "Þ" => "B",
        "Ĉ" => "C", "Ć" => "C", "Ç" => "C", "Ц" => "C", "צ" => "C", "Ċ" => "C", "Č" => "C", "©" => "C", "ץ" => "C",
        "Д" => "D", "Ď" => "D", "Đ" => "D", "ד" => "D", "Ð" => "D",
        "È" => "E", "Ę" => "E", "É" => "E", "Ë" => "E", "Ê" => "E", "Е" => "E", "Ē" => "E", "Ė" => "E", "Ě" => "E", "Ĕ" => "E", "Є" => "E", "Ə" => "E", "ע" => "E",
        "Ф" => "F", "Ƒ" => "F",
        "Ğ" => "G", "Ġ" => "G", "Ģ" => "G", "Ĝ" => "G", "Г" => "G", "ג" => "G", "Ґ" => "G",
        "ח" => "H", "Ħ" => "H", "Х" => "H", "Ĥ" => "H", "ה" => "H",
        "I" => "I", "Ï" => "I", "Î" => "I", "Í" => "I", "Ì" => "I", "Į" => "I", "Ĭ" => "I", "I" => "I", "И" => "I", "Ĩ" => "I", "Ǐ" => "I", "י" => "I", "Ї" => "I", "Ī" => "I", "І" => "I",
        "Й" => "J", "Ĵ" => "J",
        "ĸ" => "K", "כ" => "K", "Ķ" => "K", "К" => "K", "ך" => "K",
        "Ł" => "L", "Ŀ" => "L", "Л" => "L", "Ļ" => "L", "Ĺ" => "L", "Ľ" => "L", "ל" => "L",
        "מ" => "M", "М" => "M", "ם" => "M",
        "Ñ" => "N", "Ń" => "N", "Н" => "N", "Ņ" => "N", "ן" => "N", "Ŋ" => "N", "נ" => "N", "ʼn" => "N", "Ň" => "N",
        "Ø" => "O", "Ó" => "O", "Ò" => "O", "Ô" => "O", "Õ" => "O", "О" => "O", "Ő" => "O", "Ŏ" => "O", "Ō" => "O", "Ǿ" => "O", "Ǒ" => "O", "Ơ" => "O",
        "פ" => "P", "ף" => "P", "П" => "P",
        "ק" => "Q",
        "Ŕ" => "R", "Ř" => "R", "Ŗ" => "R", "ר" => "R", "Р" => "R", "®" => "R",
        "Ş" => "S", "Ś" => "S", "Ș" => "S", "Š" => "S", "С" => "S", "Ŝ" => "S", "ס" => "S",
        "Т" => "T", "Ț" => "T", "ט" => "T", "Ŧ" => "T", "ת" => "T", "Ť" => "T", "Ţ" => "T",
        "Ù" => "U", "Û" => "U", "Ú" => "U", "Ū" => "U", "У" => "U", "Ũ" => "U", "Ư" => "U", "Ǔ" => "U", "Ų" => "U", "Ŭ" => "U", "Ů" => "U", "Ű" => "U", "Ǖ" => "U", "Ǜ" => "U", "Ǚ" => "U", "Ǘ" => "U",
        "В" => "V", "ו" => "V",
        "Ý" => "Y", "Ы" => "Y", "Ŷ" => "Y", "Ÿ" => "Y",
        "Ź" => "Z", "Ž" => "Z", "Ż" => "Z", "З" => "Z", "ז" => "Z",
        "а" => "a", "ă" => "a", "ǎ" => "a", "ą" => "a", "à" => "a", "ã" => "a", "á" => "a", "æ" => "a", "â" => "a", "å" => "a", "ǻ" => "a", "ā" => "a", "א" => "a",
        "б" => "b", "ב" => "b", "þ" => "b",
        "ĉ" => "c", "ć" => "c", "ç" => "c", "ц" => "c", "צ" => "c", "ċ" => "c", "č" => "c", "©" => "c", "ץ" => "c",
        "Ч" => "ch", "ч" => "ch",
        "д" => "d", "ď" => "d", "đ" => "d", "ד" => "d", "ð" => "d",
        "è" => "e", "ę" => "e", "é" => "e", "ë" => "e", "ê" => "e", "е" => "e", "ē" => "e", "ė" => "e", "ě" => "e", "ĕ" => "e", "є" => "e", "ə" => "e", "ע" => "e",
        "ф" => "f", "ƒ" => "f",
        "ğ" => "g", "ġ" => "g", "ģ" => "g", "ĝ" => "g", "г" => "g", "ג" => "g", "ґ" => "g",
        "ח" => "h", "ħ" => "h", "х" => "h", "ĥ" => "h", "ה" => "h",
        "i" => "i", "ï" => "i", "î" => "i", "í" => "i", "ì" => "i", "į" => "i", "ĭ" => "i", "ı" => "i", "и" => "i", "ĩ" => "i", "ǐ" => "i", "י" => "i", "ї" => "i", "ī" => "i", "і" => "i",
        "й" => "j", "Й" => "j", "Ĵ" => "j", "ĵ" => "j",
        "ĸ" => "k", "כ" => "k", "ķ" => "k", "к" => "k", "ך" => "k",
        "ł" => "l", "ŀ" => "l", "л" => "l", "ļ" => "l", "ĺ" => "l", "ľ" => "l", "ל" => "l",
        "מ" => "m", "м" => "m", "ם" => "m",
        "ñ" => "n", "ń" => "n", "н" => "n", "ņ" => "n", "ן" => "n", "ŋ" => "n", "נ" => "n", "ʼn" => "n", "ň" => "n",
        "ø" => "o", "ó" => "o", "ò" => "o", "ô" => "o", "õ" => "o", "о" => "o", "ő" => "o", "ŏ" => "o", "ō" => "o", "ǿ" => "o", "ǒ" => "o", "ơ" => "o",
        "פ" => "p", "ף" => "p", "п" => "p",
        "ק" => "q",
        "ŕ" => "r", "ř" => "r", "ŗ" => "r", "ר" => "r", "р" => "r", "®" => "r",
        "ş" => "s", "ś" => "s", "ș" => "s", "š" => "s", "с" => "s", "ŝ" => "s", "ס" => "s",
        "т" => "t", "ț" => "t", "ט" => "t", "ŧ" => "t", "ת" => "t", "ť" => "t", "ţ" => "t",
        "ù" => "u", "û" => "u", "ú" => "u", "ū" => "u", "у" => "u", "ũ" => "u", "ư" => "u", "ǔ" => "u", "ų" => "u", "ŭ" => "u", "ů" => "u", "ű" => "u", "ǖ" => "u", "ǜ" => "u", "ǚ" => "u", "ǘ" => "u",
        "в" => "v", "ו" => "v",
        "ý" => "y", "ы" => "y", "ŷ" => "y", "ÿ" => "y",
        "ź" => "z", "ž" => "z", "ż" => "z", "з" => "z", "ז" => "z", "ſ" => "z",
        "™" => "tm",
        "@" => "at",
        "Ä" => "ae", "Ǽ" => "ae", "ä" => "ae", "æ" => "ae", "ǽ" => "ae",
        "ij" => "ij", "IJ" => "ij",
        "я" => "ja", "Я" => "ja",
        "Э" => "je", "э" => "je",
        "ё" => "jo", "Ё" => "jo",
        "ю" => "ju", "Ю" => "ju",
        "œ" => "oe", "Œ" => "oe", "ö" => "oe", "Ö" => "oe",
        "щ" => "sch", "Щ" => "sch",
        "ш" => "sh", "Ш" => "sh",
        "ß" => "ss",
        "Ü" => "ue",
        "Ж" => "zh", "ж" => "zh",
    );
    return strtr($subject, $char_map);
}

$string = "Ħí ŧħə®ë, юßť å test!";
echo replace_spec_char($string);

Ħí ŧħə®ë, юßť å test! => Hi there, jusst a test!

Ceci ne mélange pas les caractères majuscules et minuscules sauf pour les caractères plus longs (par exemple: ss, ch, sch), ajouté @ ® ©

De plus, si vous souhaitez créer une correspondance regex indépendamment des caractères spéciaux:

rss => '[rŕřŘŗŖרŔРр](?:[sșсŜšśסşСŝ][sșсŜšśסşСŝ]|[ß])'

Une implémentation vala de ceci: https://code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477

Voici la liste de base avec laquelle vous pourriez travailler. Avec le remplacement de regex (dans un texte sublime) ou un petit script, vous pouvez construire n'importe quoi à partir de ce tableau pour répondre à vos besoins.

"-" => "ъьЪЬ",
"A" => "АĂǍĄÀÃÁÆÂÅǺĀא",
"B" => "БבÞ",
"C" => "ĈĆÇЦצĊČ©ץ",
"D" => "ДĎĐדÐ",
"E" => "ÈĘÉËÊЕĒĖĚĔЄƏע",
"F" => "ФƑ",
"G" => "ĞĠĢĜГגҐ",
"H" => "חĦХĤה",
"I" => "IÏÎÍÌĮĬIИĨǏיЇĪІ",
"J" => "ЙĴ",
"K" => "ĸכĶКך",
"L" => "ŁĿЛĻĹĽל",
"M" => "מМם",
"N" => "ÑŃНŅןŊנʼnŇ",
"O" => "ØÓÒÔÕОŐŎŌǾǑƠ",
"P" => "פףП",
"Q" => "ק",
"R" => "ŔŘŖרР®",
"S" => "ŞŚȘŠСŜס",
"T" => "ТȚטŦתŤŢ",
"U" => "ÙÛÚŪУŨƯǓŲŬŮŰǕǛǙǗ",
"V" => "Вו",
"Y" => "ÝЫŶŸ",
"Z" => "ŹŽŻЗז",
"a" => "аăǎąàãáæâåǻāא",
"b" => "бבþ",
"c" => "ĉćçцצċč©ץ",
"ch" => "ч",
"d" => "дďđדð",
"e" => "èęéëêеēėěĕєəע",
"f" => "фƒ",
"g" => "ğġģĝгגґ",
"h" => "חħхĥה",
"i" => "iïîíìįĭıиĩǐיїīі",
"j" => "йĵ",
"k" => "ĸכķкך",
"l" => "łŀлļĺľל",
"m" => "מмם",
"n" => "ñńнņןŋנʼnň",
"o" => "øóòôõоőŏōǿǒơ",
"p" => "פףп",
"q" => "ק",
"r" => "ŕřŗרр®",
"s" => "şśșšсŝס",
"t" => "тțטŧתťţ",
"u" => "ùûúūуũưǔųŭůűǖǜǚǘ",
"v" => "вו",
"y" => "ýыŷÿ",
"z" => "źžżзזſ",
"tm" => "™",
"at" => "@",
"ae" => "ÄǼäæǽ",
"ch" => "Чч",
"ij" => "ijIJ",
"j" => "йЙĴĵ",
"ja" => "яЯ",
"je" => "Ээ",
"jo" => "ёЁ",
"ju" => "юЮ",
"oe" => "œŒöÖ",
"sch" => "щЩ",
"sh" => "шШ",
"ss" => "ß",
"tm" => "™",
"ue" => "Ü",
"zh" => "Жж"
8
Kwaadpepper

Disclaimer: Je ne supporte plus cette réponse (j'étais aveugle à l'époque). Mais merci pour les votes positifs = P

Vous pouvez prendre cela comme base. À partir de WordPress, utilisé pour générer de jolies urls (le point d’entrée est la fonction slugify ()):

/**
 * Converts all accent characters to ASCII characters.
 *
 * If there are no accent characters, then the string given is just returned.
 *
 * @param string $string Text that might have accent characters
 * @return string Filtered string with replaced "Nice" characters.
 */

function remove_accents($string) {
 if (!preg_match('/[\x80-\xff]/', $string))
  return $string;
 if (seems_utf8($string)) {
  $chars = array(
  // Decompositions for Latin-1 Supplement
  chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
  chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
  chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
  chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',
  chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',
  chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',
  chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',
  chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',
  chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
  chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
  chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
  chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
  chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
  chr(195).chr(159) => 's', chr(195).chr(160) => 'a',
  chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',
  chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',
  chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',
  chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
  chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
  chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
  chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
  chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',
  chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',
  chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',
  chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',
  chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',
  chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',
  chr(195).chr(191) => 'y',
  // Decompositions for Latin Extended-A
  chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
  chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
  chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
  chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
  chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
  chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
  chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
  chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
  chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
  chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
  chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
  chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
  chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
  chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
  chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
  chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
  chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
  chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
  chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
  chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
  chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
  chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
  chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
  chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
  chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
  chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
  chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
  chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
  chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
  chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
  chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
  chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
  chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
  chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
  chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
  chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
  chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
  chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
  chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
  chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
  chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
  chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
  chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
  chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
  chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
  chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
  chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
  chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
  chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
  chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
  chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
  chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
  chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
  chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
  chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
  chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
  chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
  chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
  chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
  chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
  chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
  chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
  chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
  chr(197).chr(190) => 'z', chr(197).chr(191) => 's',
  // Euro Sign
  chr(226).chr(130).chr(172) => 'E',
  // GBP (Pound) Sign
  chr(194).chr(163) => '');
  $string = strtr($string, $chars);
 } else {
  // Assume ISO-8859-1 if not UTF-8
  $chars['in'] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158)
   .chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194)
   .chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202)
   .chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210)
   .chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218)
   .chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227)
   .chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235)
   .chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243)
   .chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251)
   .chr(252).chr(253).chr(255);
  $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";
  $string = strtr($string, $chars['in'], $chars['out']);
  $double_chars['in'] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254));
  $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th');
  $string = str_replace($double_chars['in'], $double_chars['out'], $string);
 }
 return $string;
}

/**
 * Checks to see if a string is utf8 encoded.
 *
 * @author bmorel at ssi dot fr
 *
 * @param string $Str The string to be checked
 * @return bool True if $Str fits a UTF-8 model, false otherwise.
 */
function seems_utf8($Str) { # by bmorel at ssi dot fr
 $length = strlen($Str);
 for ($i = 0; $i < $length; $i++) {
  if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
  elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n = 1; # 110bbbbb
  elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n = 2; # 1110bbbb
  elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n = 3; # 11110bbb
  elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n = 4; # 111110bb
  elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n = 5; # 1111110b
  else return false; # Does not match any model
  for ($j = 0; $j < $n; $j++) { # n bytes matching 10bbbbbb follow ?
   if ((++$i == $length) || ((ord($Str[$i]) & 0xC0) != 0x80))
   return false;
  }
 }
 return true;
}

function utf8_uri_encode($utf8_string, $length = 0) {
 $unicode = '';
 $values = array();
 $num_octets = 1;
 $unicode_length = 0;
 $string_length = strlen($utf8_string);
 for ($i = 0; $i < $string_length; $i++) {
  $value = ord($utf8_string[$i]);
  if ($value < 128) {
   if ($length && ($unicode_length >= $length))
    break;
   $unicode .= chr($value);
   $unicode_length++;
  } else {
   if (count($values) == 0) $num_octets = ($value < 224) ? 2 : 3;
   $values[] = $value;
   if ($length && ($unicode_length + ($num_octets * 3)) > $length)
    break;
   if (count( $values ) == $num_octets) {
    if ($num_octets == 3) {
     $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
     $unicode_length += 9;
    } else {
     $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
     $unicode_length += 6;
    }
    $values = array();
    $num_octets = 1;
   }
  }
 }
 return $unicode;
}

/**
 * Sanitizes title, replacing whitespace with dashes.
 *
 * Limits the output to alphanumeric characters, underscore (_) and dash (-).
 * Whitespace becomes a dash.
 *
 * @param string $title The title to be sanitized.
 * @return string The sanitized title.
 */
function slugify($title) {
 $title = strip_tags($title);
 // Preserve escaped octets.
 $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
 // Remove percent signs that are not part of an octet.
 $title = str_replace('%', '', $title);
 // Restore octets.
 $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
 $title = remove_accents($title);
 if (seems_utf8($title)) {
  if (function_exists('mb_strtolower')) {
   $title = mb_strtolower($title, 'UTF-8');
  }
  $title = utf8_uri_encode($title, 200);
 }
 $title = strtolower($title);
 $title = preg_replace('/&.+?;/', '', $title); // kill entities
 $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
 $title = preg_replace('/\s+/', '-', $title);
 $title = preg_replace('|-+|', '-', $title);
 $title = trim($title, '-');
 return $title;
}
5
Keyne

si vous avez http://php.net/manual/en/book.intl.php disponible, cela résoudra votre problème:

$string = "Éric Cantona";
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);

MODIFIER

Pour installer l'extension php dans Ubuntu:

apt-get install php-intl

N'oubliez pas, dans Composer, d'exiger l'extension ext-intl pour s’assurer qu’il s’intègre correctement dans les systèmes déployés.

5
gabo

J'ai cherché et votre idée de bande de l'accent est assez géniale et rentable, mais votre regex est mal faite et manque 2 paramètres supplémentaires. Longue histoire courte la regex doit être:

$patterns[0] = '/[áâàåä]/ui';
$patterns[1] = '/[ðéêèë]/ui';
$patterns[2] = '/[íîìï]/ui';
$patterns[3] = '/[óôòøõö]/ui';
$patterns[4] = '/[úûùü]/ui';
$patterns[5] = '/æ/ui';
$patterns[6] = '/ç/ui';
$patterns[7] = '/ß/ui';
$replacements[0] = 'a';
$replacements[1] = 'e';
$replacements[2] = 'i';
$replacements[3] = 'o';
$replacements[4] = 'u';
$replacements[5] = 'ae';
$replacements[6] = 'c';
$replacements[7] = 'ss';

Comme vous pouvez le constater, la situation est assez similaire, mais le plus important est les paramas après la deuxième barre oblique de l’expression régulière. Quand une expression regualr est comme ceci /[someCoolRegex]/ui le u indique qu’il doit utiliser unicode et le i indique que la casse est respectée, j’ai testé le mien et avec l’ansewer dans ce forum I doit dire est plus économique que d'utiliser strtr.

J'espère que quelqu'un lit cette réponse.

3
Colorman

strtolower ne fonctionne que sur les chaînes codées iso-8859-1. Vous pouvez essayer avec mb_strtolower.

Ou, si vous devez utiliser des extensions multi-octets, vous pouvez également utiliser le support de translittération d'iconv:

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text);

Modifier:

Il semble que j'ai été un peu rapide. Vous semblez utiliser iso-8859-1, votre stratégie actuelle fonctionnera donc. Vous avez juste besoin d'écrire l'expression rationnelle correctement. Par exemple.:

'/(ð|é|ê|è|ë)/'

ne pas:

'/[ð|é|ê|è|ë]/'
2
troelskn

Je sais, cette question a été posée il y a très longtemps ...

Je recherchais une solution courte et élégante, mais je ne trouvais pas satisfaction pour deux raisons:

Premièrement, la plupart des solutions existantes remplacent une liste de caractères par une liste d’autres caractères. Malheureusement, il faut utiliser un codage spécifique pour le fichier de script php lui-même, qui pourrait être indésirable.

Deuxièmement, utiliser iconv semble être un bon moyen, mais cela ne suffit pas, le résultat d'un personnage converti pouvant être un ou deux caractères, ou une exception fatale.

Alors j'ai écrit cette petite fonction qui fait le travail:

function replaceAccent($string, $replacement = '_')
{
    $alnumPattern = '/^[a-zA-Z0-9 ]+$/';

    if (preg_match($alnumPattern, $string)) {
        return $string;
    }

    $ret = array_map(
        function ($chr) use ($alnumPattern, $replacement) {
            if (preg_match($alnumPattern, $chr)) {
                return $chr;
            } else {
                $chr = @iconv('ISO-8859-1', 'ASCII//TRANSLIT', $chr);
                if (strlen($chr) == 1) {
                    return $chr;
                } elseif (strlen($chr) > 1) {
                    $ret = '';
                    foreach (str_split($chr) as $char2) {
                        if (preg_match($alnumPattern, $char2)) {
                            $ret .= $char2;
                        }
                    }
                    return $ret;
                } else {
                    // replace whatever iconv fail to convert by something else
                    return $replacement;
                }
            }
        },
        str_split($string)
    );

    return implode($ret);
}
1
frenus

Ajoutant un peu à ce que Lizard disait, cela fonctionnait pour s'afficher correctement sur la page Web, mais ajoutait d'autres codes pour compléter ce que je cherchais pour remplacer mes balises afin de rechercher correctement dans ma base de données avec des caractères spéciaux. Merci d'avance.

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
                            'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
                            'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
                            'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
                            'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 
                            '&#225;'=>'a', '&#233;'=>'e', '&#237;'=>'i', '&#243;'=>'o', '&#250;'=>'u',  
                            '&#193;'=>'A', '&#201;'=>'E', '&#205;'=>'I', '&#211;'=>'O', '&#218;'=>'U',
                            '&#209;'=>'N', '&#241;'=>'n' );
$newtag = strtr( $newtag, $unwanted_array );
0
Luis H Cabrejo

En guise d'alternative (un peu plus complexe par nature), regardez comment wordpress supprime les accent. Faites quelques modifications ci-dessous pour le faire fonctionner indépendamment sans faire référence à wordpress fonctions.

     function mbstring_binary_safe_encoding($reset = false)
{
    static $encodings  = array();
    static $overloaded = null;

    if (is_null($overloaded)) {
        $overloaded = function_exists('mb_internal_encoding') && (ini_get('mbstring.func_overload') & 2);
    }

    if (false === $overloaded) {
        return;
    }

    if (!$reset) {
        $encoding = mb_internal_encoding();
        array_Push($encodings, $encoding);
        mb_internal_encoding('ISO-8859-1');
    }

    if ($reset && $encodings) {
        $encoding = array_pop($encodings);
        mb_internal_encoding($encoding);
    }
}

function seems_utf8($str)
{
    mbstring_binary_safe_encoding();
    $length = strlen($str);
    mbstring_binary_safe_encoding(true);
    for ($i = 0; $i < $length; $i++) {
        $c = ord($str[$i]);
        if ($c < 0x80) {
            $n = 0;
        }
        // 0bbbbbbb
        elseif (($c & 0xE0) == 0xC0) {
            $n = 1;
        }
        // 110bbbbb
        elseif (($c & 0xF0) == 0xE0) {
            $n = 2;
        }
        // 1110bbbb
        elseif (($c & 0xF8) == 0xF0) {
            $n = 3;
        }
        // 11110bbb
        elseif (($c & 0xFC) == 0xF8) {
            $n = 4;
        }
        // 111110bb
        elseif (($c & 0xFE) == 0xFC) {
            $n = 5;
        }
        // 1111110b
        else {
                return false;
            }
            // Does not match any model
            for ($j = 0; $j < $n; $j++) {
                // n bytes matching 10bbbbbb follow ?
                if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) {
                    return false;
                }

            }
        }
        return true;
    }

    function remove_accents($string)
{
        if (!preg_match('/[\x80-\xff]/', $string)) {
            return $string;
        }

        if (seems_utf8($string)) {
            $chars = array(
                // Decompositions for Latin-1 Supplement
                'ª' => 'a', 'º'  => 'o',
                'À' => 'A', 'Á'  => 'A',
                'Â' => 'A', 'Ã'  => 'A',
                'Ä' => 'A', 'Å'  => 'A',
                'Æ' => 'AE', 'Ç' => 'C',
                'È' => 'E', 'É'  => 'E',
                'Ê' => 'E', 'Ë'  => 'E',
                'Ì' => 'I', 'Í'  => 'I',
                'Î' => 'I', 'Ï'  => 'I',
                'Ð' => 'D', 'Ñ'  => 'N',
                'Ò' => 'O', 'Ó'  => 'O',
                'Ô' => 'O', 'Õ'  => 'O',
                'Ö' => 'O', 'Ù'  => 'U',
                'Ú' => 'U', 'Û'  => 'U',
                'Ü' => 'U', 'Ý'  => 'Y',
                'Þ' => 'TH', 'ß' => 's',
                'à' => 'a', 'á'  => 'a',
                'â' => 'a', 'ã'  => 'a',
                'ä' => 'a', 'å'  => 'a',
                'æ' => 'ae', 'ç' => 'c',
                'è' => 'e', 'é'  => 'e',
                'ê' => 'e', 'ë'  => 'e',
                'ì' => 'i', 'í'  => 'i',
                'î' => 'i', 'ï'  => 'i',
                'ð' => 'd', 'ñ'  => 'n',
                'ò' => 'o', 'ó'  => 'o',
                'ô' => 'o', 'õ'  => 'o',
                'ö' => 'o', 'ø'  => 'o',
                'ù' => 'u', 'ú'  => 'u',
                'û' => 'u', 'ü'  => 'u',
                'ý' => 'y', 'þ'  => 'th',
                'ÿ' => 'y', 'Ø'  => 'O',
                // Decompositions for Latin Extended-A
                'Ā' => 'A', 'ā'  => 'a',
                'Ă' => 'A', 'ă'  => 'a',
                'Ą' => 'A', 'ą'  => 'a',
                'Ć' => 'C', 'ć'  => 'c',
                'Ĉ' => 'C', 'ĉ'  => 'c',
                'Ċ' => 'C', 'ċ'  => 'c',
                'Č' => 'C', 'č'  => 'c',
                'Ď' => 'D', 'ď'  => 'd',
                'Đ' => 'D', 'đ'  => 'd',
                'Ē' => 'E', 'ē'  => 'e',
                'Ĕ' => 'E', 'ĕ'  => 'e',
                'Ė' => 'E', 'ė'  => 'e',
                'Ę' => 'E', 'ę'  => 'e',
                'Ě' => 'E', 'ě'  => 'e',
                'Ĝ' => 'G', 'ĝ'  => 'g',
                'Ğ' => 'G', 'ğ'  => 'g',
                'Ġ' => 'G', 'ġ'  => 'g',
                'Ģ' => 'G', 'ģ'  => 'g',
                'Ĥ' => 'H', 'ĥ'  => 'h',
                'Ħ' => 'H', 'ħ'  => 'h',
                'Ĩ' => 'I', 'ĩ'  => 'i',
                'Ī' => 'I', 'ī'  => 'i',
                'Ĭ' => 'I', 'ĭ'  => 'i',
                'Į' => 'I', 'į'  => 'i',
                'İ' => 'I', 'ı'  => 'i',
                'IJ' => 'IJ', 'ij' => 'ij',
                'Ĵ' => 'J', 'ĵ'  => 'j',
                'Ķ' => 'K', 'ķ'  => 'k',
                'ĸ' => 'k', 'Ĺ'  => 'L',
                'ĺ' => 'l', 'Ļ'  => 'L',
                'ļ' => 'l', 'Ľ'  => 'L',
                'ľ' => 'l', 'Ŀ'  => 'L',
                'ŀ' => 'l', 'Ł'  => 'L',
                'ł' => 'l', 'Ń'  => 'N',
                'ń' => 'n', 'Ņ'  => 'N',
                'ņ' => 'n', 'Ň'  => 'N',
                'ň' => 'n', 'ʼn'  => 'n',
                'Ŋ' => 'N', 'ŋ'  => 'n',
                'Ō' => 'O', 'ō'  => 'o',
                'Ŏ' => 'O', 'ŏ'  => 'o',
                'Ő' => 'O', 'ő'  => 'o',
                'Œ' => 'OE', 'œ' => 'oe',
                'Ŕ' => 'R', 'ŕ'  => 'r',
                'Ŗ' => 'R', 'ŗ'  => 'r',
                'Ř' => 'R', 'ř'  => 'r',
                'Ś' => 'S', 'ś'  => 's',
                'Ŝ' => 'S', 'ŝ'  => 's',
                'Ş' => 'S', 'ş'  => 's',
                'Š' => 'S', 'š'  => 's',
                'Ţ' => 'T', 'ţ'  => 't',
                'Ť' => 'T', 'ť'  => 't',
                'Ŧ' => 'T', 'ŧ'  => 't',
                'Ũ' => 'U', 'ũ'  => 'u',
                'Ū' => 'U', 'ū'  => 'u',
                'Ŭ' => 'U', 'ŭ'  => 'u',
                'Ů' => 'U', 'ů'  => 'u',
                'Ű' => 'U', 'ű'  => 'u',
                'Ų' => 'U', 'ų'  => 'u',
                'Ŵ' => 'W', 'ŵ'  => 'w',
                'Ŷ' => 'Y', 'ŷ'  => 'y',
                'Ÿ' => 'Y', 'Ź'  => 'Z',
                'ź' => 'z', 'Ż'  => 'Z',
                'ż' => 'z', 'Ž'  => 'Z',
                'ž' => 'z', 'ſ'  => 's',
                // Decompositions for Latin Extended-B
                'Ș' => 'S', 'ș'  => 's',
                'Ț' => 'T', 'ț'  => 't',
                // Euro Sign
                '€' => 'E',
                // GBP (Pound) Sign
                '£' => '',
                // Vowels with diacritic (Vietnamese)
                // unmarked
                'Ơ' => 'O', 'ơ'  => 'o',
                'Ư' => 'U', 'ư'  => 'u',
                // Grave accent
                'Ầ' => 'A', 'ầ'  => 'a',
                'Ằ' => 'A', 'ằ'  => 'a',
                'Ề' => 'E', 'ề'  => 'e',
                'Ồ' => 'O', 'ồ'  => 'o',
                'Ờ' => 'O', 'ờ'  => 'o',
                'Ừ' => 'U', 'ừ'  => 'u',
                'Ỳ' => 'Y', 'ỳ'  => 'y',
                // hook
                'Ả' => 'A', 'ả'  => 'a',
                'Ẩ' => 'A', 'ẩ'  => 'a',
                'Ẳ' => 'A', 'ẳ'  => 'a',
                'Ẻ' => 'E', 'ẻ'  => 'e',
                'Ể' => 'E', 'ể'  => 'e',
                'Ỉ' => 'I', 'ỉ'  => 'i',
                'Ỏ' => 'O', 'ỏ'  => 'o',
                'Ổ' => 'O', 'ổ'  => 'o',
                'Ở' => 'O', 'ở'  => 'o',
                'Ủ' => 'U', 'ủ'  => 'u',
                'Ử' => 'U', 'ử'  => 'u',
                'Ỷ' => 'Y', 'ỷ'  => 'y',
                // tilde
                'Ẫ' => 'A', 'ẫ'  => 'a',
                'Ẵ' => 'A', 'ẵ'  => 'a',
                'Ẽ' => 'E', 'ẽ'  => 'e',
                'Ễ' => 'E', 'ễ'  => 'e',
                'Ỗ' => 'O', 'ỗ'  => 'o',
                'Ỡ' => 'O', 'ỡ'  => 'o',
                'Ữ' => 'U', 'ữ'  => 'u',
                'Ỹ' => 'Y', 'ỹ'  => 'y',
                // acute accent
                'Ấ' => 'A', 'ấ'  => 'a',
                'Ắ' => 'A', 'ắ'  => 'a',
                'Ế' => 'E', 'ế'  => 'e',
                'Ố' => 'O', 'ố'  => 'o',
                'Ớ' => 'O', 'ớ'  => 'o',
                'Ứ' => 'U', 'ứ'  => 'u',
                // dot below
                'Ạ' => 'A', 'ạ'  => 'a',
                'Ậ' => 'A', 'ậ'  => 'a',
                'Ặ' => 'A', 'ặ'  => 'a',
                'Ẹ' => 'E', 'ẹ'  => 'e',
                'Ệ' => 'E', 'ệ'  => 'e',
                'Ị' => 'I', 'ị'  => 'i',
                'Ọ' => 'O', 'ọ'  => 'o',
                'Ộ' => 'O', 'ộ'  => 'o',
                'Ợ' => 'O', 'ợ'  => 'o',
                'Ụ' => 'U', 'ụ'  => 'u',
                'Ự' => 'U', 'ự'  => 'u',
                'Ỵ' => 'Y', 'ỵ'  => 'y',
                // Vowels with diacritic (Chinese, Hanyu Pinyin)
                'ɑ' => 'a',
                // macron
                'Ǖ' => 'U', 'ǖ'  => 'u',
                // acute accent
                'Ǘ' => 'U', 'ǘ'  => 'u',
                // caron
                'Ǎ' => 'A', 'ǎ'  => 'a',
                'Ǐ' => 'I', 'ǐ'  => 'i',
                'Ǒ' => 'O', 'ǒ'  => 'o',
                'Ǔ' => 'U', 'ǔ'  => 'u',
                'Ǚ' => 'U', 'ǚ'  => 'u',
                // Grave accent
                'Ǜ' => 'U', 'ǜ'  => 'u',
            );

            $string = strtr($string, $chars);
        } else {
            $chars = array();
            // Assume ISO-8859-1 if not UTF-8
            $chars['in'] = "\x80\x83\x8a\x8e\x9a\x9e"
                . "\x9f\xa2\xa5\xb5\xc0\xc1\xc2"
                . "\xc3\xc4\xc5\xc7\xc8\xc9\xca"
                . "\xcb\xcc\xcd\xce\xcf\xd1\xd2"
                . "\xd3\xd4\xd5\xd6\xd8\xd9\xda"
                . "\xdb\xdc\xdd\xe0\xe1\xe2\xe3"
                . "\xe4\xe5\xe7\xe8\xe9\xea\xeb"
                . "\xec\xed\xee\xef\xf1\xf2\xf3"
                . "\xf4\xf5\xf6\xf8\xf9\xfa\xfb"
                . "\xfc\xfd\xff";

            $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";

            $string              = strtr($string, $chars['in'], $chars['out']);
            $double_chars        = array();
            $double_chars['in']  = array("\x8c", "\x9c", "\xc6", "\xd0", "\xde", "\xdf", "\xe6", "\xf0", "\xfe");
            $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th');
            $string              = str_replace($double_chars['in'], $double_chars['out'], $string);
        }

        return $string;
    }
0
Selay