PHP fonction pour faire slug (chaîne d'URL)

Question

function gen_slug($str){ # special accents $a = array('À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','Ø','Ù','Ú','Û','Ü','Ý','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ñ','ò','ó','ô','õ','ö','ø','ù','ú','û','ü','ý','ÿ','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','Ð','d','E','e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i','I','i','I','i','?','?','J','j','K','k','L','l','L','l','L','l','?','?','L','l','N','n','N','n','N','n','?','O','o','O','o','O','o','Œ','œ','R','r','R','r','R','r','S','s','S','s','S','s','Š','š','T','t','T','t','T','t','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Ÿ','Z','z','Z','z','Ž','ž','?','ƒ','O','o','U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','?','?','?','?','?','?'); $b = array('A','A','A','A','A','A','AE','C','E','E','E','E','I','I','I','I','D','N','O','O','O','O','O','O','U','U','U','U','Y','s','a','a','a','a','a','a','ae','c','e','e','e','e','i','i','i','i','n','o','o','o','o','o','o','u','u','u','u','y','y','A','a','A','a','A','a','C','c','C','c','C','c','C','c','D','d','D','d','E','e','E','e','E','e','E','e','E','e','G','g','G','g','G','g','G','g','H','h','H','h','I','i','I','i','I','i','I','i','I','i','IJ','ij','J','j','K','k','L','l','L','l','L','l','L','l','l','l','N','n','N','n','N','n','n','O','o','O','o','O','o','OE','oe','R','r','R','r','R','r','S','s','S','s','S','s','S','s','T','t','T','t','T','t','U','u','U','u','U','u','U','u','U','u','U','u','W','w','Y','y','Y','Z','z','Z','z','Z','z','s','f','O','o','U','u','A','a','I','i','O','o','U','u','U','u','U','u','U','u','U','u','A','a','AE','ae','O','o'); return strtolower(preg_replace(array('/[^a-zA-Z0-9 -]/','/[ -]+/','/^-|-$/'),array('','-',''),str_replace($a,$b,$str))); }

Fonctionne très bien, mais j'ai trouvé des cas dans lesquels cela échoue:

gen_slug('andrés') renvoie andras au lieu de andres

Pourquoi? Des idées sur les paramètres preg_replace?

Maerlyn · Accepted Answer

Au lieu d’un long remplacement, essayez celui-ci:

public static function slugify($text) { // replace non letter or digits by - $text = preg_replace('~[^\pL\d]+~u', '-', $text); // transliterate $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text); // remove unwanted characters $text = preg_replace('~[^-\w]+~', '', $text); // trim $text = trim($text, '-'); // remove duplicate - $text = preg_replace('~-+~', '-', $text); // lowercase $text = strtolower($text); if (empty($text)) { return 'n-a'; } return $text; }

Ceci était basé sur celui du tutoriel Jobeet de Symfony.

TheKalpit · Answer

Mettre à jour

Puisque cette réponse attire l'attention, j'ajoute une explication.

La solution fournie remplacera essentiellement tout sauf les expressions A-Z, a-z, 0-9, & - (trait d'union) avec - (trait d'union). Donc, cela ne fonctionnera pas correctement avec d'autres caractères Unicode (qui sont des caractères valides pour un slug/string d'URL). Un scénario courant survient lorsque la chaîne d'entrée contient des caractères non anglais.

Utilisez cette solution uniquement si vous êtes sûr que la chaîne d'entrée n'aura pas de caractères unicode que vous voudrez peut-être faire partie de la sortie/slug.

Par exemple. "शक्ति" deviendra "----------" (tous les traits d'union) au lieu de "नारी-शक्ति" (adresse URL valide).

Réponse originale

Que diriez-vous...

$slug = strtolower(trim(preg_replace('/[^A-Za-z0-9-]+/', '-', $string)));

?

hdogan · Answer

Si vous avez installé intl extension, vous pouvez utiliser transliterator_transliterate function pour créer facilement un fichier.

Vous pouvez remplacer les espaces par des tirets plus tard pour le rendre plus semblable à une limace.

<?php $string = "andrés"; $string = transliterator_transliterate("Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();", $string); echo $string; ?>

Imran Omar Bukhsh · Answer

Note: J'ai pris ceci de wordpress et ça marche !!

Utilisez-le comme ceci:

echo sanitize('testing this link');

Code

//taken from wordpress function utf8_uri_encode( $utf8_string, $length = 0 ) { $unicode = ''; $values = array(); $num_octets = 1; $unicode_length = 0; $string_length = strlen( $utf8_string ); for ($i = 0; $i < $string_length; $i++ ) { $value = ord( $utf8_string[ $i ] ); if ( $value < 128 ) { if ( $length && ( $unicode_length >= $length ) ) break; $unicode .= chr($value); $unicode_length++; } else { if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3; $values[] = $value; if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length ) break; if ( count( $values ) == $num_octets ) { if ($num_octets == 3) { $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]); $unicode_length += 9; } else { $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]); $unicode_length += 6; } $values = array(); $num_octets = 1; } } } return $unicode; } //taken from wordpress function seems_utf8($str) { $length = strlen($str); for ($i=0; $i < $length; $i++) { $c = ord($str[$i]); if ($c < 0x80) $n = 0; # 0bbbbbbb elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b else return false; # Does not match any model for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ? if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) return false; } } return true; } //function sanitize_title_with_dashes taken from wordpress function sanitize($title) { $title = strip_tags($title); // Preserve escaped octets. $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title); // Remove percent signs that are not part of an octet. $title = str_replace('%', '', $title); // Restore octets. $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title); if (seems_utf8($title)) { if (function_exists('mb_strtolower')) { $title = mb_strtolower($title, 'UTF-8'); } $title = utf8_uri_encode($title, 200); } $title = strtolower($title); $title = preg_replace('/&.+?;/', '', $title); // kill entities $title = str_replace('.', '-', $title); $title = preg_replace('/[^%a-z0-9 _-]/', '', $title); $title = preg_replace('/\s+/', '-', $title); $title = preg_replace('|-+|', '-', $title); $title = trim($title, '-'); return $title; }

Baptiste Gaillard · Answer

En voici un autre, par exemple "Titre avec des caractères étranges ééé A X Z" devient "titre-avec-caractères-étranges-eee-a-x-z".

/** * Function used to create a slug associated to an "ugly" string. * * @param string $string the string to transform. * * @return string the resulting slug. */ public static function createSlug($string) { $table = array( 'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', '/' => '-', ' ' => '-' ); // -- Remove duplicated spaces $stripped = preg_replace(array('/\s{2,}/', '/[	
]/'), ' ', $string); // -- Returns the slug return strtolower(strtr($string, $table)); }

czerasz · Answer

Une version mise à jour du code @Imran Omar Bukhsh (de la dernière branche de Wordpress (4.0)):

<?php // Add methods to slugify taken from Wordpress: // - https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php // - https://github.com/WordPress/WordPress/blob/master/wp-includes/functions.php /** * Set the mbstring internal encoding to a binary safe encoding when func_overload * is enabled. * * When mbstring.func_overload is in use for multi-byte encodings, the results from * strlen() and similar functions respect the utf8 characters, causing binary data * to return incorrect lengths. * * This function overrides the mbstring encoding to a binary-safe encoding, and * resets it to the users expected encoding afterwards through the * `reset_mbstring_encoding` function. * * It is safe to recursively call this function, however each * `mbstring_binary_safe_encoding()` call must be followed up with an equal number * of `reset_mbstring_encoding()` calls. * * @since 3.7.0 * * @see reset_mbstring_encoding() * * @param bool $reset Optional. Whether to reset the encoding back to a previously-set encoding. * Default false. */ function mbstring_binary_safe_encoding( $reset = false ) { static $encodings = array(); static $overloaded = null; if ( is_null( $overloaded ) ) $overloaded = function_exists( 'mb_internal_encoding' ) && ( ini_get( 'mbstring.func_overload' ) & 2 ); if ( false === $overloaded ) return; if ( ! $reset ) { $encoding = mb_internal_encoding(); array_Push( $encodings, $encoding ); mb_internal_encoding( 'ISO-8859-1' ); } if ( $reset && $encodings ) { $encoding = array_pop( $encodings ); mb_internal_encoding( $encoding ); } } /** * Reset the mbstring internal encoding to a users previously set encoding. * * @see mbstring_binary_safe_encoding() * * @since 3.7.0 */ function reset_mbstring_encoding() { mbstring_binary_safe_encoding( true ); } /** * Checks to see if a string is utf8 encoded. * * NOTE: This function checks for 5-Byte sequences, UTF8 * has Bytes Sequences with a maximum length of 4. * * @author bmorel at ssi dot fr (modified) * @since 1.2.1 * * @param string $str The string to be checked * @return bool True if $str fits a UTF-8 model, false otherwise. */ function seems_utf8($str) { mbstring_binary_safe_encoding(); $length = strlen($str); reset_mbstring_encoding(); for ($i=0; $i < $length; $i++) { $c = ord($str[$i]); if ($c < 0x80) $n = 0; # 0bbbbbbb elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b else return false; # Does not match any model for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ? if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) return false; } } return true; } /** * Encode the Unicode values to be used in the URI. * * @since 1.5.0 * * @param string $utf8_string * @param int $length Max length of the string * @return string String with Unicode encoded for URI. */ function utf8_uri_encode( $utf8_string, $length = 0 ) { $unicode = ''; $values = array(); $num_octets = 1; $unicode_length = 0; mbstring_binary_safe_encoding(); $string_length = strlen( $utf8_string ); reset_mbstring_encoding(); for ($i = 0; $i < $string_length; $i++ ) { $value = ord( $utf8_string[ $i ] ); if ( $value < 128 ) { if ( $length && ( $unicode_length >= $length ) ) break; $unicode .= chr($value); $unicode_length++; } else { if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3; $values[] = $value; if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length ) break; if ( count( $values ) == $num_octets ) { if ($num_octets == 3) { $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]); $unicode_length += 9; } else { $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]); $unicode_length += 6; } $values = array(); $num_octets = 1; } } } return $unicode; } /** * Sanitizes a title, replacing whitespace and a few other characters with dashes. * * Limits the output to alphanumeric characters, underscore (_) and dash (-). * Whitespace becomes a dash. * * @since 1.2.0 * * @param string $title The title to be sanitized. * @param string $raw_title Optional. Not used. * @param string $context Optional. The operation for which the string is sanitized. * @return string The sanitized title. */ function sanitize_title_with_dashes( $title, $raw_title = '', $context = 'display' ) { $title = strip_tags($title); // Preserve escaped octets. $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title); // Remove percent signs that are not part of an octet. $title = str_replace('%', '', $title); // Restore octets. $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title); if (seems_utf8($title)) { if (function_exists('mb_strtolower')) { $title = mb_strtolower($title, 'UTF-8'); } $title = utf8_uri_encode($title, 200); } $title = strtolower($title); $title = preg_replace('/&.+?;/', '', $title); // kill entities $title = str_replace('.', '-', $title); if ( 'save' == $context ) { // Convert nbsp, ndash and mdash to hyphens $title = str_replace( array( '%c2%a0', '%e2%80%93', '%e2%80%94' ), '-', $title ); // Strip these characters entirely $title = str_replace( array( // iexcl and iquest '%c2%a1', '%c2%bf', // angle quotes '%c2%ab', '%c2%bb', '%e2%80%b9', '%e2%80%ba', // curly quotes '%e2%80%98', '%e2%80%99', '%e2%80%9c', '%e2%80%9d', '%e2%80%9a', '%e2%80%9b', '%e2%80%9e', '%e2%80%9f', // copy, reg, deg, hellip and trade '%c2%a9', '%c2%ae', '%c2%b0', '%e2%80%a6', '%e2%84%a2', // acute accents '%c2%b4', '%cb%8a', '%cc%81', '%cd%81', // Grave accent, macron, caron '%cc%80', '%cc%84', '%cc%8c', ), '', $title ); // Convert times to x $title = str_replace( '%c3%97', 'x', $title ); } $title = preg_replace('/[^%a-z0-9 _-]/', '', $title); $title = preg_replace('/\s+/', '-', $title); $title = preg_replace('|-+|', '-', $title); $title = trim($title, '-'); return $title; } $title = '#PFW Alexander McQueen Spring/Summer 2015'; echo "title -> slug: 
". $title ." -> ". sanitize_title_with_dashes($title); echo "

"; $title = '«GQ»: Elyas M\'Barek gehört zu Männern des Jahres'; echo "title -> slug: 
". $title ." -> ". sanitize_title_with_dashes($title);

Voir en ligne exemple .

Entendu · Answer

N'utilisez pas preg_replace pour cela. Il existe une fonction php spécialement conçue pour la tâche: strtr () http://php.net/manual/en/function.strtr.php

Tiré des commentaires dans le lien ci-dessus (et je l'ai testé moi-même; cela fonctionne:

function normalize ($string) { $table = array( 'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', ); return strtr($string, $table); }

Vazgen Manukyan · Answer

c'est toujours une bonne idée d'utiliser des solutions existantes supportées par de nombreux développeurs de haut niveau. Le plus populaire est https://github.com/cocur/slugify . Tout d'abord, il prend en charge plusieurs langues et est en cours de mise à jour.

Si vous ne voulez pas utiliser le paquet entier, vous pouvez simplement copier la partie dont vous avez besoin.

Mladen Janjetovic · Answer

J'utilise:

function slugify($text) { $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text); return strtolower(preg_replace('/[^A-Za-z0-9-]+/', '-', $text)); }

La seule solution de rechange est que les caractères cyrilliques ne seront pas convertis et je cherche maintenant une solution qui ne soit pas longue str_replace pour chaque caractère cyrillique.

Nady Shalaby · Answer

public static function slugify ($text) { $replace = [ '&lt;' => '', '&gt;' => '', '&#039;' => '', '&amp;' => '', '&quot;' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä'=> 'Ae', '&Auml;' => 'A', 'Å' => 'A', 'Ā' => 'A', 'Ą' => 'A', 'Ă' => 'A', 'Æ' => 'Ae', 'Ç' => 'C', 'Ć' => 'C', 'Č' => 'C', 'Ĉ' => 'C', 'Ċ' => 'C', 'Ď' => 'D', 'Đ' => 'D', 'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ē' => 'E', 'Ę' => 'E', 'Ě' => 'E', 'Ĕ' => 'E', 'Ė' => 'E', 'Ĝ' => 'G', 'Ğ' => 'G', 'Ġ' => 'G', 'Ģ' => 'G', 'Ĥ' => 'H', 'Ħ' => 'H', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'Ī' => 'I', 'Ĩ' => 'I', 'Ĭ' => 'I', 'Į' => 'I', 'İ' => 'I', 'Ĳ' => 'IJ', 'Ĵ' => 'J', 'Ķ' => 'K', 'Ł' => 'K', 'Ľ' => 'K', 'Ĺ' => 'K', 'Ļ' => 'K', 'Ŀ' => 'K', 'Ñ' => 'N', 'Ń' => 'N', 'Ň' => 'N', 'Ņ' => 'N', 'Ŋ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'Oe', '&Ouml;' => 'Oe', 'Ø' => 'O', 'Ō' => 'O', 'Ő' => 'O', 'Ŏ' => 'O', 'Œ' => 'OE', 'Ŕ' => 'R', 'Ř' => 'R', 'Ŗ' => 'R', 'Ś' => 'S', 'Š' => 'S', 'Ş' => 'S', 'Ŝ' => 'S', 'Ș' => 'S', 'Ť' => 'T', 'Ţ' => 'T', 'Ŧ' => 'T', 'Ț' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'Ū' => 'U', '&Uuml;' => 'Ue', 'Ů' => 'U', 'Ű' => 'U', 'Ŭ' => 'U', 'Ũ' => 'U', 'Ų' => 'U', 'Ŵ' => 'W', 'Ý' => 'Y', 'Ŷ' => 'Y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'Ž' => 'Z', 'Ż' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'ae', '&auml;' => 'ae', 'å' => 'a', 'ā' => 'a', 'ą' => 'a', 'ă' => 'a', 'æ' => 'ae', 'ç' => 'c', 'ć' => 'c', 'č' => 'c', 'ĉ' => 'c', 'ċ' => 'c', 'ď' => 'd', 'đ' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ē' => 'e', 'ę' => 'e', 'ě' => 'e', 'ĕ' => 'e', 'ė' => 'e', 'ƒ' => 'f', 'ĝ' => 'g', 'ğ' => 'g', 'ġ' => 'g', 'ģ' => 'g', 'ĥ' => 'h', 'ħ' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ī' => 'i', 'ĩ' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ı' => 'i', 'ĳ' => 'ij', 'ĵ' => 'j', 'ķ' => 'k', 'ĸ' => 'k', 'ł' => 'l', 'ľ' => 'l', 'ĺ' => 'l', 'ļ' => 'l', 'ŀ' => 'l', 'ñ' => 'n', 'ń' => 'n', 'ň' => 'n', 'ņ' => 'n', 'ŉ' => 'n', 'ŋ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe', '&ouml;' => 'oe', 'ø' => 'o', 'ō' => 'o', 'ő' => 'o', 'ŏ' => 'o', 'œ' => 'oe', 'ŕ' => 'r', 'ř' => 'r', 'ŗ' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'ue', 'ū' => 'u', '&uuml;' => 'ue', 'ů' => 'u', 'ű' => 'u', 'ŭ' => 'u', 'ũ' => 'u', 'ų' => 'u', 'ŵ' => 'w', 'ý' => 'y', 'ÿ' => 'y', 'ŷ' => 'y', 'ž' => 'z', 'ż' => 'z', 'ź' => 'z', 'þ' => 't', 'ß' => 'ss', 'ſ' => 'ss', 'ый' => 'iy', 'А' => 'A', 'Б' => 'B', 'В' => 'V', 'Г' => 'G', 'Д' => 'D', 'Е' => 'E', 'Ё' => 'YO', 'Ж' => 'ZH', 'З' => 'Z', 'И' => 'I', 'Й' => 'Y', 'К' => 'K', 'Л' => 'L', 'М' => 'M', 'Н' => 'N', 'О' => 'O', 'П' => 'P', 'Р' => 'R', 'С' => 'S', 'Т' => 'T', 'У' => 'U', 'Ф' => 'F', 'Х' => 'H', 'Ц' => 'C', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '', 'Ы' => 'Y', 'Ь' => '', 'Э' => 'E', 'Ю' => 'YU', 'Я' => 'YA', 'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 'ё' => 'yo', 'ж' => 'zh', 'з' => 'z', 'и' => 'i', 'й' => 'y', 'к' => 'k', 'л' => 'l', 'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's', 'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch', 'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ы' => 'y', 'ь' => '', 'э' => 'e', 'ю' => 'yu', 'я' => 'ya' ]; // make a human readable string $text = strtr($text, $replace); // replace non letter or digits by - $text = preg_replace('~[^\pL\d.]+~u', '-', $text); // trim $text = trim($text, '-'); // remove unwanted characters $text = preg_replace('~[^-\w.]+~', '', $text); $text = strtolower($text); return $text; }

Bery · Answer

Qu'en est-il d'utiliser quelque chose qui est déjà implémenté dans Core?

//Clean non UTF-8 characters Mage::getHelper('core/string')->cleanString($str)

Ou l’une des méthodes principales de réécriture url/url.

Lucas Bustamante · Answer

Il existe une bonne solution ici qui traite également des caractères spéciaux.

Texto Fantástico => texto-fantastico

function slugify( $string, $separator = '-' ) { $accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|Grave|lig|orn|ring|slash|th|tilde|uml);~i'; $special_cases = array( '&' => 'and', "'" => ''); $string = mb_strtolower( trim( $string ), 'UTF-8' ); $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string ); $string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) ); $string = preg_replace("/[^a-z0-9]/u", "$separator", $string); $string = preg_replace("/[$separator]+/u", "$separator", $string); return $string; }

Auteur: Natxet

Serty Oan · Answer

Vous pouvez jeter un oeil à Normalizer::normalize(), voir ici . Il suffit de charger le module intl pour PHP

F&#233;lix O. · Answer

Je ne savais pas lequel utiliser alors j'ai fait un banc rapide sur phptester.net

<?php // First test // https://stackoverflow.com/a/42740874/10232729 function slugify(STRING $string, STRING $separator = '-'){ $accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|Grave|lig|orn|ring|slash|th|tilde|uml);~i'; $special_cases = [ '&' => 'and', "'" => '']; $string = mb_strtolower( trim( $string ), 'UTF-8' ); $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string ); $string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) ); $string = preg_replace('/[^a-z0-9]/u', $separator, $string); return preg_replace('/['.$separator.']+/u', $separator, $string); } // Second test // https://stackoverflow.com/a/13331948/10232729 function slug(STRING $string, STRING $separator = '-'){ $string = transliterator_transliterate('Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();', $string); return str_replace(' ', $separator, $string);; } // Third test - My choice // https://stackoverflow.com/a/38066136/10232729 function slugbis($text){ $replace = [ '<' => '', '>' => '', '-' => ' ', '&' => '', '"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä'=> 'Ae', 'Ä' => 'A', 'Å' => 'A', 'Ā' => 'A', 'Ą' => 'A', 'Ă' => 'A', 'Æ' => 'Ae', 'Ç' => 'C', 'Ć' => 'C', 'Č' => 'C', 'Ĉ' => 'C', 'Ċ' => 'C', 'Ď' => 'D', 'Đ' => 'D', 'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ē' => 'E', 'Ę' => 'E', 'Ě' => 'E', 'Ĕ' => 'E', 'Ė' => 'E', 'Ĝ' => 'G', 'Ğ' => 'G', 'Ġ' => 'G', 'Ģ' => 'G', 'Ĥ' => 'H', 'Ħ' => 'H', 'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'Ī' => 'I', 'Ĩ' => 'I', 'Ĭ' => 'I', 'Į' => 'I', 'İ' => 'I', 'Ĳ' => 'IJ', 'Ĵ' => 'J', 'Ķ' => 'K', 'Ł' => 'K', 'Ľ' => 'K', 'Ĺ' => 'K', 'Ļ' => 'K', 'Ŀ' => 'K', 'Ñ' => 'N', 'Ń' => 'N', 'Ň' => 'N', 'Ņ' => 'N', 'Ŋ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', 'Ō' => 'O', 'Ő' => 'O', 'Ŏ' => 'O', 'Œ' => 'OE', 'Ŕ' => 'R', 'Ř' => 'R', 'Ŗ' => 'R', 'Ś' => 'S', 'Š' => 'S', 'Ş' => 'S', 'Ŝ' => 'S', 'Ș' => 'S', 'Ť' => 'T', 'Ţ' => 'T', 'Ŧ' => 'T', 'Ț' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'Ū' => 'U', 'Ü' => 'Ue', 'Ů' => 'U', 'Ű' => 'U', 'Ŭ' => 'U', 'Ũ' => 'U', 'Ų' => 'U', 'Ŵ' => 'W', 'Ý' => 'Y', 'Ŷ' => 'Y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'Ž' => 'Z', 'Ż' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a', 'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', 'ā' => 'a', 'ą' => 'a', 'ă' => 'a', 'æ' => 'ae', 'ç' => 'c', 'ć' => 'c', 'č' => 'c', 'ĉ' => 'c', 'ċ' => 'c', 'ď' => 'd', 'đ' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 'ē' => 'e', 'ę' => 'e', 'ě' => 'e', 'ĕ' => 'e', 'ė' => 'e', 'ƒ' => 'f', 'ĝ' => 'g', 'ğ' => 'g', 'ġ' => 'g', 'ģ' => 'g', 'ĥ' => 'h', 'ħ' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ī' => 'i', 'ĩ' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ı' => 'i', 'ĳ' => 'ij', 'ĵ' => 'j', 'ķ' => 'k', 'ĸ' => 'k', 'ł' => 'l', 'ľ' => 'l', 'ĺ' => 'l', 'ļ' => 'l', 'ŀ' => 'l', 'ñ' => 'n', 'ń' => 'n', 'ň' => 'n', 'ņ' => 'n', 'ŉ' => 'n', 'ŋ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe', 'ö' => 'oe', 'ø' => 'o', 'ō' => 'o', 'ő' => 'o', 'ŏ' => 'o', 'œ' => 'oe', 'ŕ' => 'r', 'ř' => 'r', 'ŗ' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ü' => 'ue', 'ū' => 'u', 'ü' => 'ue', 'ů' => 'u', 'ű' => 'u', 'ŭ' => 'u', 'ũ' => 'u', 'ų' => 'u', 'ŵ' => 'w', 'ý' => 'y', 'ÿ' => 'y', 'ŷ' => 'y', 'ž' => 'z', 'ż' => 'z', 'ź' => 'z', 'þ' => 't', 'ß' => 'ss', 'ſ' => 'ss', 'ый' => 'iy', 'А' => 'A', 'Б' => 'B', 'В' => 'V', 'Г' => 'G', 'Д' => 'D', 'Е' => 'E', 'Ё' => 'YO', 'Ж' => 'ZH', 'З' => 'Z', 'И' => 'I', 'Й' => 'Y', 'К' => 'K', 'Л' => 'L', 'М' => 'M', 'Н' => 'N', 'О' => 'O', 'П' => 'P', 'Р' => 'R', 'С' => 'S', 'Т' => 'T', 'У' => 'U', 'Ф' => 'F', 'Х' => 'H', 'Ц' => 'C', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '', 'Ы' => 'Y', 'Ь' => '', 'Э' => 'E', 'Ю' => 'YU', 'Я' => 'YA', 'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 'ё' => 'yo', 'ж' => 'zh', 'з' => 'z', 'и' => 'i', 'й' => 'y', 'к' => 'k', 'л' => 'l', 'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's', 'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch', 'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ы' => 'y', 'ь' => '', 'э' => 'e', 'ю' => 'yu', 'я' => 'ya' ]; // make a human readable string $text = strtr($text, $replace); // replace non letter or digits by - $text = preg_replace('~[^\pL\d.]+~u', '-', $text); // trim $text = trim($text, '-'); // remove unwanted characters $text = preg_replace('~[^-\w.]+~', '', $text); return strtolower($text); } // Fourth test // https://stackoverflow.com/a/2955521/10232729 function slugagain($string){ $table = [ 'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', ' '=>'-' ]; return strtr($string, $table); } // Fifth test // https://stackoverflow.com/a/27396804/10232729 function slugifybis($url){ $url = trim($url); $url = str_replace(' ', '-', $url); $url = str_replace('/', '-slash-', $url); return rawurlencode($url); } // Sixth and last test // https://stackoverflow.com/a/39442034/10232729 setlocale( LC_ALL, "en_US.UTF8" ); function slugifyagain($string){ $string = iconv('utf-8', 'us-ascii//translit//ignore', $string); // transliterate $string = str_replace("'", '', $string); $string = preg_replace('~[^\pL\d]+~u', '-', $string); // replace non letter or non digits by "-" $string = preg_replace('~[^-\w]+~', '', $string); // remove unwanted characters $string = preg_replace('~-+~', '-', $string); // remove duplicate "-" $string = trim($string, '-'); // trim "-" $string = trim($string); // trim $string = mb_strtolower($string, 'utf-8'); // lowercase return urlencode($string); // safe; }; $string = $newString = "¿ Àñdréß l'affreux ğarçon & nøël en forêt !"; $max = 10000; echo '<pre>'; echo 'Beginning :'; echo '<br />'; echo '<br />'; echo '> Slugging '.$max.' iterations of following :'; echo '<br />'; echo '>> ' . $string; echo '<br />'; echo '<br />'; echo 'Output results :'; echo '<br />'; echo '<br />'; $start = microtime(true); for($i = 0 ; $i < $max ; $i++){ $newString = slugify($string); } $time = (microtime(true) - $start) * 1000; echo '> First test passed in **' . round($time, 2) . 'ms**'; echo '<br />'; echo '>> Result : ' . $newString; echo '<br />'; echo '<br />'; $start = microtime(true); for($i = 0 ; $i < $max ; $i++){ $newString = slug($string); } $time = (microtime(true) - $start) * 1000; echo '> Second test passed in **' . round($time, 2) . 'ms**'; echo '<br />'; echo '>> Result : ' . $newString; echo '<br />'; echo '<br />'; $start = microtime(true); for($i = 0 ; $i < $max ; $i++){ $newString = slugbis($string); } $time = (microtime(true) - $start) * 1000; echo '> Third test passed in **' . round($time, 2) . 'ms**'; echo '<br />'; echo '>> Result : ' . $newString; echo '<br />'; echo '<br />'; $start = microtime(true); for($i = 0 ; $i < $max ; $i++){ $newString = slugagain($string); } $time = (microtime(true) - $start) * 1000; echo '> Fourth test passed in **' . round($time, 2) . 'ms**'; echo '<br />'; echo '>> Result : ' . $newString; echo '<br />'; echo '<br />'; $start = microtime(true); for($i = 0 ; $i < $max ; $i++){ $newString = slugifybis($string); } $time = (microtime(true) - $start) * 1000; echo '> Fifth test passed in **' . round($time, 2) . 'ms**'; echo '<br />'; echo '>> Result : ' . $newString; echo '<br />'; echo '<br />'; $start = microtime(true); for($i = 0 ; $i < $max ; $i++){ $newString = slugifyagain($string); } $time = (microtime(true) - $start) * 1000; echo '> Sixth test passed in **' . round($time, 2) . 'ms**'; echo '<br />'; echo '>> Result : ' . $newString; echo '</pre>';

Début :

Slugging 10000 itérations de ce qui suit:

¿Àñdréß l'affreux ğarçon & nøël en forêt!

Résultats de sortie:

Premier test réussi en 120.78ms

Résultat: -iquest-andresz-laffreux-arcon-et-noel-en-foret-

Deuxième test réussi en 3883.82ms

Résultat: -andreß-laffreux-garcon - nøel-en-foret-

Troisième test réussi en 56.83ms

Résultat: andress-l-affreux-garcon-noel-en-foret

Le quatrième test a passé en 18.93ms

Résultat: ¿-AndreSs-l'affreux-ğarcon - & - noel-en-foret-!

Cinquième test réussi en 6,45ms

Résultat:% C2% BF-% C3% 80% C3% B1dr% C3% A9% C3% 9F-l% 27affreux-% C4% 9Far% C3% A7% 26-n% C3% B8% C3% C3% ABl- en-pour% C3% AAt-% 21

Sixième test réussi en 112.42ms

Résultat: andress-laffreux-garcon-n-el-en-foret

Des tests supplémentaires sont nécessaires.

Edit: moins d'itérations test

Début :

Slugging 100 itérations suivantes:

¿Àñdréß l'affreux ğarçon & nøël en forêt!

Résultats de sortie:

Premier test réussi en 1.72ms

Résultat: -iquest-andresz-laffreux-arcon-et-noel-en-foret-

Deuxième test réussi en 48.59ms

Résultat: -andreß-laffreux-garcon - nøel-en-foret-

Troisième test réussi en 0.91ms

Résultat: andress-l-affreux-garcon-noel-en-foret

Quatrième test réussi en 0.3ms

Résultat: ¿-AndreSs-l'affreux-ğarcon - & - noel-en-foret-!

Cinquième test réussi en 0.14ms

Résultat:% C2% BF-% C3% 80% C3% B1dr% C3% A9% C3% 9F-l% 27affreux-% C4% 9Far% C3% A7% 26-n% C3% B8% C3% C3% ABl- en-pour% C3% AAt-% 21

Sixième test réussi en 1.4ms

Résultat: andress-laffreux-garcon-n-el-en-foret

Jiř&#237; Dvoř&#225;k · Answer

sur mon hôte local tout allait bien, mais sur le serveur, cela m'a aidé à "set_locale" et "utf-8" à "mb_strtolower".

<? setlocale( LC_ALL, "en_US.UTF8" ); function slug( $string ) { $string = iconv( "utf-8", "us-ascii//translit//ignore", $string ); // transliterate $string = str_replace( "'", "", $string ); $string = preg_replace( "~[^\pL\d]+~u", "-", $string ); // replace non letter or non digits by "-" $string = preg_replace( "~[^-\w]+~", "", $string ); // remove unwanted characters $string = preg_replace( "~-+~", "-", $string ); // remove duplicate "-" $string = trim( $string, "-" ); // trim "-" $string = trim( $string ); // trim $string = mb_strtolower( $string, "utf-8" ); // lowercase $string = urlencode( $string ); // safe return $string; }; ?>

Paulo Victor · Answer

Je pense que la manière la plus élégante consiste à utiliser un Behat\Transliterator\Transliterator.

J'ai besoin d'étendre ce cours de votre classe parce que c'est un résumé, certains comme ceci:

<?php use Behat\Transliterator\Transliterator; class Urlizer extends Transliterator { }

Et puis, utilisez-le:

$text = "Master Ápiu"; $urlizer = new Urlizer(); $slug = $urlizer->transliterate($slug, "-"); echo $slug; // master-apiu

Bien sûr, vous devriez également mettre ces choses dans votre compositeur.

composer require behat/transliterator

Plus d'infos ici https://github.com/Behat/Transliterator

MTJ · Answer

Depuis que les gTLD et les IDN sont de plus en plus utilisés, je ne vois pas pourquoi l'URL ne devrait pas contenir Andrés.

Brut simplement rawurlencode $ URL que vous voulez. La plupart des navigateurs affichent des caractères UTF-8 dans les URL (pas certains anciens IE6) et bit.ly/goo.gl peuvent être utilisés pour faire court, dans des cas tels que le russe et l'arabe, si besoin est, à des fins publicitaires ou simplement pour les écrire comme l'utilisateur voudrait les écrire sur l'URL du navigateur.

La seule différence est les espaces "", il peut être judicieux de les remplacer par "-" et "/" si vous ne souhaitez pas les autoriser.

<?php function slugify($url) { $url = trim($url); $url = str_replace(" ","-",$url); $url = str_replace("/","-slash-",$url); $url = rawurlencode($url); } ?>

Url telle que codée http://www.hurtta.com/RU/%D0%9F%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1 % 8B/

Url tel qu'écrit http://www.hurtta.com/RU/Продукты/

daygloink · Answer

J'ai écrit ceci en me basant sur la réponse de Maerlyn. Cette fonction fonctionnera indépendamment du codage de caractères sur la page. De plus, il ne convertira pas les guillemets simples en tirets :)

function slugify ($string) { $string = utf8_encode($string); $string = iconv('UTF-8', 'ASCII//TRANSLIT', $string); $string = preg_replace('/[^a-z0-9- ]/i', '', $string); $string = str_replace(' ', '-', $string); $string = trim($string, '-'); $string = strtolower($string); if (empty($string)) { return 'n-a'; } return $string; }

Ima · Answer

J'ai un code de travail qui a fonctionné dans le site Web espagnol. S'il vous plaît voir le code dans mon blog

Fonction permettant de générer des slug d'URL propres à partir d'une chaîne avec contrôle de duplication

Muhammad Bilal · Answer

Depuis que j'ai vu beaucoup de méthodes ici, mais j'ai trouvé une méthode la plus simple pour moi-même. Peut-être que cela aidera quelqu'un.

$slug = strtolower(preg_replace('/[^a-zA-Z0-9\-]/', '',preg_replace('/\s+/', '-', $string) ));