Je extrait un fichier JSON d'un site et l'une des chaînes reçues est la suivante:
The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi
Comment puis-je convertir des éléments tels que ‘
en caractères corrects?
J'ai créé un terrain de jeu Xcode pour le démontrer:
import UIKit
var error: NSError?
let blogUrl: NSURL = NSURL.URLWithString("http://sophisticatedignorance.net/api/get_recent_summary/")
let jsonData = NSData(contentsOfURL: blogUrl)
let dataDictionary = NSJSONSerialization.JSONObjectWithData(jsonData, options: nil, error: &error) as NSDictionary
var a = dataDictionary["posts"] as NSArray
println(a[0]["title"])
Il n'y a pas de moyen simple de le faire, mais vous pouvez utiliser NSAttributedString
magic pour rendre ce processus aussi simple que possible (soyez averti que cette méthode effacera également toutes les balises HTML):
let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
// encodedString should = a[0]["title"] in your case
guard let data = htmlEncodedString.data(using: .utf8) else {
return nil
}
let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options) else {
return nil
}
let decodedString = attributedString.string // The Weeknd ‘King Of The Fall’
Pensez à initialiser NSAttributedString à partir du thread principal uniquement. Il utilise un peu de magie WebKit dessous, donc l'exigence.
Vous pouvez créer votre propre extension String
pour augmenter la réutilisabilité:
extension String {
init?(htmlEncodedString: String) {
guard let data = htmlEncodedString.data(using: .utf8) else {
return nil
}
let options: [String: Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
]
guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
}
let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
let decodedString = String(htmlEncodedString: encodedString)
La réponse de @ akashivskyy est excellente et montre comment utiliser NSAttributedString
pour décoder des entités HTML. Un inconvénient possible (comme il l’a déclaré) est que tout le balisage HTML est également supprimé, donc
<strong> 4 < 5 & 3 > 2</strong>
devient
4 < 5 & 3 > 2
Sur OS X, il y a CFXMLCreateStringByUnescapingEntities()
qui fait le travail:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. @ "
let decoded = CFXMLCreateStringByUnescapingEntities(nil, encoded, nil) as String
println(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. @
mais ce n'est pas disponible sur iOS.
Voici une implémentation Swift pure. Il décode les entités de caractères Références telles que <
à l'aide d'un dictionnaire, ainsi que toutes les entités numériques Telles que @
ou €
. (Notez que je n'ai pas répertorié toutes les entités 252 explicitement.)
Swift 4:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ Substring : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "@"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(_ string : Substring, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("@") --> "@"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : Substring) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X") {
return decodeNumeric(entity.dropFirst(3).dropLast(), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.dropFirst(2).dropLast(), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self[position...].range(of: "&") {
result.append(contentsOf: self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
guard let semiRange = self[position...].range(of: ";") else {
// No matching ';'.
break
}
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(contentsOf: entity)
}
}
// Copy remaining characters to `result`:
result.append(contentsOf: self[position...])
return result
}
}
Exemple:
let encoded = "<strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. @ "
let decoded = encoded.stringByDecodingHTMLEntities
print(decoded)
// <strong> 4 < 5 & 3 > 2 .</strong> Price: 12 €. @
Swift 3:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "@"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(_ string : String, base : Int) -> Character? {
guard let code = UInt32(string, radix: base),
let uniScalar = UnicodeScalar(code) else { return nil }
return Character(uniScalar)
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("@") --> "@"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(_ entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 3) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substring(with: entity.index(entity.startIndex, offsetBy: 2) ..< entity.index(entity.endIndex, offsetBy: -1)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.range(of: "&", range: position ..< endIndex) {
result.append(self[position ..< ampRange.lowerBound])
position = ampRange.lowerBound
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.range(of: ";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.upperBound]
position = semiRange.upperBound
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.append(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.append(self[position ..< endIndex])
return result
}
}
Swift 2:
// Mapping from XML/HTML character entity reference to character
// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
private let characterEntities : [ String : Character ] = [
// XML predefined entities:
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
// HTML character entity references:
" " : "\u{00a0}",
// ...
"♦" : "♦",
]
extension String {
/// Returns a new string made by replacing in the `String`
/// all HTML character entity references with the corresponding
/// character.
var stringByDecodingHTMLEntities : String {
// ===== Utility functions =====
// Convert the number in the string to the corresponding
// Unicode character, e.g.
// decodeNumeric("64", 10) --> "@"
// decodeNumeric("20ac", 16) --> "€"
func decodeNumeric(string : String, base : Int32) -> Character? {
let code = UInt32(strtoul(string, nil, base))
return Character(UnicodeScalar(code))
}
// Decode the HTML character entity to the corresponding
// Unicode character, return `nil` for invalid input.
// decode("@") --> "@"
// decode("€") --> "€"
// decode("<") --> "<"
// decode("&foo;") --> nil
func decode(entity : String) -> Character? {
if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)
} else if entity.hasPrefix("&#") {
return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)
} else {
return characterEntities[entity]
}
}
// ===== Method starts here =====
var result = ""
var position = startIndex
// Find the next '&' and copy the characters preceding it to `result`:
while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< ampRange.startIndex])
position = ampRange.startIndex
// Find the next ';' and copy everything from '&' to ';' into `entity`
if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {
let entity = self[position ..< semiRange.endIndex]
position = semiRange.endIndex
if let decoded = decode(entity) {
// Replace by decoded character:
result.append(decoded)
} else {
// Invalid entity, copy verbatim:
result.appendContentsOf(entity)
}
} else {
// No matching ';'.
break
}
}
// Copy remaining characters to `result`:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}
Swift 3 version de @ akashivskyy extension ,
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
Swift 2 version de l'extension @ akashivskyy,
extension String {
init(htmlEncodedString: String) {
if let encodedData = htmlEncodedString.dataUsingEncoding(NSUTF8StringEncoding){
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
do{
if let attributedString:NSAttributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil){
self.init(attributedString.string)
}else{
print("error")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}catch{
print("error: \(error)")
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}else{
self.init(htmlEncodedString) //Returning actual string if there is an error
}
}
}
Swift 4
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
extension String{
func decodeEnt() -> String{
let encodedData = self.dataUsingEncoding(NSUTF8StringEncoding)!
let attributedOptions : [String: AnyObject] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding
]
let attributedString = NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil, error: nil)!
return attributedString.string
}
}
let encodedString = "The Weeknd ‘King Of The Fall’"
let foo = encodedString.decodeEnt() // The Weeknd ‘King Of The Fall’
extension String {
init(htmlEncodedString: String) {
self.init()
guard let encodedData = htmlEncodedString.data(using: .utf8) else {
self = htmlEncodedString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error)")
self = htmlEncodedString
}
}
}
Je recherchais un pur utilitaire Swift 3.0 pour échapper aux références de caractères HTML (pour les applications Swift côté serveur sous MacOS et Linux), mais je n'ai trouvé aucune solution complète. J'ai donc écrit ma propre implémentation: https://github.com/IBM-Swift/swift-html-entities
Le package, HTMLEntities
, fonctionne avec les références de caractères nommés HTML4 ainsi que les références de caractères numériques hexadécimales/décimales. Il reconnaîtra les références de caractères numériques spéciaux conformément à la spécification HTML5 W3 (c.-à-d. €
doit être non échappé sous le symbole Euro (unicode U+20AC
) et NON en tant que caractère unicode pour U+0080
, et certaines plages de références de caractères numériques doivent être remplacées par le caractère de remplacement U+FFFD
lorsqu’elles sont respectées.
Exemple d'utilisation:
import HTMLEntities
// encode example
let html = "<script>alert(\"abc\")</script>"
print(html.htmlEscape())
// Prints ”<script>alert("abc")</script>"
// decode example
let htmlencoded = "<script>alert("abc")</script>"
print(htmlencoded.htmlUnescape())
// Prints ”<script>alert(\"abc\")</script>"
Et pour l'exemple d'OP:
print("The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi ".htmlUnescape())
// prints "The Weeknd ‘King Of The Fall’ [Video Premiere] | @TheWeeknd | #SoPhi "
Edit: HTMLEntities
prend désormais en charge les références de caractères nommés HTML5 à partir de la version 2.0.0. L'analyse conforme à la spécification est également implémentée.
Elegant Swift 4 Solution
Si vous voulez une ficelle
myString = String(htmlString: encodedString)
Ajouter cette extension à votre projet
extension String {
init(htmlString: String) {
self.init()
guard let encodedData = htmlString.data(using: .utf8) else {
self = htmlString
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error.localizedDescription)")
self = htmlString
}
}
}
Si vous voulez un NSAttributedString avec gras, italique, liens, etc.:
textField.attributedText = try? NSAttributedString(htmlString: encodedString)
Ajouter cette extension à votre projet
extension NSAttributedString {
convenience init(htmlString html: String) throws {
try self.init(data: Data(html.utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil)
}
}
Version var calculée de réponse de @yishus
public extension String {
/// Decodes string with html encoding.
var htmlDecoded: String {
guard let encodedData = self.data(using: .utf8) else { return self }
let attributedOptions: [String : Any] = [
NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue]
do {
let attributedString = try NSAttributedString(data: encodedData,
options: attributedOptions,
documentAttributes: nil)
return attributedString.string
} catch {
print("Error: \(error)")
return self
}
}
}
Swift 4:
La solution totale qui a finalement fonctionné pour moi avec du code html, des caractères de nouvelle ligne et des guillemets simples
extension String {
var htmlDecoded: String {
let decoded = try? NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
return decoded ?? self
}
}
Utilisation:
let yourStringEncoded = yourStringWithHtmlcode.htmlDecoded
J'ai ensuite dû appliquer d'autres filtres pour supprimer single quotes
(par exemple: don't, hasn't, It's
etc.), ainsi que des caractères de nouvelle ligne tels que \n
.
var yourNewString = String(yourStringEncoded.filter { !"\n\t\r".contains($0) })
yourNewString = yourNewString.replacingOccurrences(of: "\'", with: "", options: NSString.CompareOptions.literal, range: nil)
Swift 4
extension String {
var replacingHTMLEntities: String? {
do {
return try NSAttributedString(data: Data(utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil).string
} catch {
return nil
}
}
}
Utilisation simple
let clean = "Weeknd ‘King Of The Fall’".replacingHTMLEntities ?? "default value"
Ce serait mon approche. Vous pouvez ajouter le dictionnaire des entités à partir de https://Gist.github.com/mwaterfall/25b4a6a06dc3309d9555 mentionne Michael Waterfall.
extension String {
func htmlDecoded()->String {
guard (self != "") else { return self }
var newStr = self
let entities = [
""" : "\"",
"&" : "&",
"'" : "'",
"<" : "<",
">" : ">",
]
for (name,value) in entities {
newStr = newStr.stringByReplacingOccurrencesOfString(name, withString: value)
}
return newStr
}
}
Exemples utilisés:
let encoded = "this is so "good""
let decoded = encoded.htmlDecoded() // "this is so "good""
OR
let encoded = "this is so "good"".htmlDecoded() // "this is so "good""
Swift 4
func decodeHTML(string: String) -> String? {
var decodedString: String?
if let encodedData = string.data(using: .utf8) {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]
do {
decodedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil).string
} catch {
print("\(error.localizedDescription)")
}
}
return decodedString
}
Mise à jour de la réponse sur Swift 3
extension String {
init?(htmlEncodedString: String) {
let encodedData = htmlEncodedString.data(using: String.Encoding.utf8)!
let attributedOptions = [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]
guard let attributedString = try? NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil) else {
return nil
}
self.init(attributedString.string)
}
Swift 4
extension String {
mutating func toHtmlEncodedString() {
guard let encodedData = self.data(using: .utf8) else {
return
}
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.documentType.rawValue): NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey(rawValue: NSAttributedString.DocumentAttributeKey.characterEncoding.rawValue): String.Encoding.utf8.rawValue
]
do {
let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)
self = attributedString.string
} catch {
print("Error: \(error)")
}
}
Objectif c
+(NSString *) decodeHTMLEnocdedString:(NSString *)htmlEncodedString {
if (!htmlEncodedString) {
return nil;
}
NSData *data = [htmlEncodedString dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *attributes = @{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType,
NSCharacterEncodingDocumentAttribute: @(NSUTF8StringEncoding)};
NSAttributedString *attributedString = [[NSAttributedString alloc] initWithData:data options:attributes documentAttributes:nil error:nil];
return [attributedString string];
}
Normalement, si vous convertissez directement le code HTML en chaîne attribuée, la taille de la police est augmentée. Vous pouvez essayer de convertir une chaîne HTML en chaîne attribuée, puis de nouveau pour voir la différence.
À la place, voici la conversion de la taille réelle} qui s'assure que la taille de la police ne change pas en appliquant le ratio de 0,75 à toutes les polices
extension String {
func htmlAttributedString() -> NSAttributedString? {
guard let data = self.data(using: String.Encoding.utf16, allowLossyConversion: false) else { return nil }
guard let attriStr = try? NSMutableAttributedString(
data: data,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil) else { return nil }
attriStr.beginEditing()
attriStr.enumerateAttribute(NSFontAttributeName, in: NSMakeRange(0, attriStr.length), options: .init(rawValue: 0)) {
(value, range, stop) in
if let font = value as? UIFont {
let resizedFont = font.withSize(font.pointSize * 0.75)
attriStr.addAttribute(NSFontAttributeName,
value: resizedFont,
range: range)
}
}
attriStr.endEditing()
return attriStr
}
}
Swift4
J'aime beaucoup la solution utilisant documentAttributes, mais il est possible qu’elle ralentisse l’analyse des fichiers et/ou l’utilisation dans les cellules de la vue tableau. Je ne peux pas croire que Apple ne fournit pas une solution décente à cet égard.
En guise de solution de contournement, j'ai trouvé sur GitHub cette extension de chaîne qui fonctionne parfaitement et rapidement pour le décodage.
Donc pour les situations dans lesquelles la réponse donnée est lente voir la solution suggérée dans ce lien: https://Gist.github.com/mwaterfall/25b4a6a06dc3309d9555 Remarque: c'est le cas pas analyser les balises HTML.
Swift 4.1 +
var htmlDecoded: String {
let attributedOptions: [NSAttributedString.DocumentReadingOptionKey : Any] = [
NSAttributedString.DocumentReadingOptionKey.documentType : NSAttributedString.DocumentType.html,
NSAttributedString.DocumentReadingOptionKey.characterEncoding : String.Encoding.utf8.rawValue
]
let decoded = try? NSAttributedString(data: Data(utf8), options: attributedOptions
, documentAttributes: nil).string
return decoded ?? self
}
Pour être complet, j'ai copié les principales caractéristiques du site: