Comment obtenir le protocole Open Graph d'une page Web par php?

Question

PHP a une commande simple pour obtenir les balises méta d'une page Web (get_meta_tags), mais cela ne fonctionne que pour les balises méta avec des attributs de nom. Cependant, Open Graph Protocol devient de plus en plus populaire de nos jours. Quel est le moyen le plus simple d'obtenir les valeurs d'OPG à partir d'une page Web? Par exemple:

<meta property="og:url" content=""> <meta property="og:title" content=""> <meta property="og:description" content=""> <meta property="og:type" content="">

La méthode de base que je vois est d’obtenir la page via cURL et de l’analyser avec regex. Une idée?

Tom · Accepted Answer

Lors de l'analyse de données HTML, vous ne devriez vraiment pas utiliser regex. Jetez un coup d'œil à la fonction de requête DOMXPath .

Maintenant, le code réel pourrait être:

[EDIT] Stefan Gehrig a donné une meilleure requête pour XPath, ainsi le code peut être réduit à:

libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings $doc = new DomDocument(); $doc->loadHTML($html); $xpath = new DOMXPath($doc); $query = '//*/meta[starts-with(@property, \'og:\')]'; $metas = $xpath->query($query); $rmetas = array(); foreach ($metas as $meta) { $property = $meta->getAttribute('property'); $content = $meta->getAttribute('content'); $rmetas[$property] = $content; } var_dump($rmetas);

Au lieu de :

$doc = new DomDocument(); @$doc->loadHTML($html); $xpath = new DOMXPath($doc); $query = '//*/meta'; $metas = $xpath->query($query); $rmetas = array(); foreach ($metas as $meta) { $property = $meta->getAttribute('property'); $content = $meta->getAttribute('content'); if(!empty($property) && preg_match('#^og:#', $property)) { $rmetas[$property] = $content; } } var_dump($rmetas);

Guilherme Viebig · Answer

Vraiment simple et bien fait:

Utilisation de https://github.com/scottmac/opengraph

$graph = OpenGraph::fetch('http://www.avessotv.com.br/bastidores-pantene-institute-experience-pg.html'); print_r($graph);

Reviendra

Objet OpenGraph

( [_values:OpenGraph:private] => Array ( [type] => article [video] => http://www.avessotv.com.br/player/flowplayer/flowplayer-3.2.7.swf?config=%7B%27clip%27%3A%7B%27url%27%3A%27http%3A%2F%2Fwww.avessotv.com.br%2Fmedia%2Fprogramas%2Fpantene.flv%27%7D%7D [image] => /wp-content/thumbnails/9025.jpg [site_name] => Programa Avesso - Bastidores [title] => Bastidores Ã¢Â€ÂœPantene Institute ExperienceÃ¢Â€Â P&G [url] => http://www.avessotv.com.br/bastidores-pantene-institute-experience-pg.html [description] => Confira os bastidores do Pantene Institute Experience, da Procter &#038; Gamble. www.pantene.com.br Mais imagens: ) [_position:OpenGraph:private] => 0 )

zerkms · Answer

Que diriez-vous:

preg_match_all('~<\s*meta\s+property="(og:[^"]+)"\s+content="([^"]*)~i', $str, $matches);

Alors, oui, prenez la page de toutes les manières possibles et analysez avec regex

Bhaskar Bhatt · Answer

Selon cette méthode, vous obtiendrez un tableau de paires de clés de balises facebook open graph.

 $url="http://fbcpictures.in"; $site_html= file_get_contents($url); $matches=null; preg_match_all('~<\s*meta\s+property="(og:[^"]+)"\s+content="([^"]*)~i', $site_html,$matches); $ogtags=array(); for($i=0;$i<count($matches[1]);$i++) { $ogtags[$matches[1][$i]]=$matches[2][$i]; }

Output of facebook open graph tags

Stefan Gehrig · Answer

La méthode la plus XMLish serait d’utiliser XPath:

$xml = simplexml_load_file('http://ogp.me/'); $xml->registerXPathNamespace('h', 'http://www.w3.org/1999/xhtml'); $result = array(); foreach ($xml->xpath('//h:meta[starts-with(@property, \'og:\')]') as $meta) { $result[(string)$meta['property']] = (string)$meta['content']; } print_r($result);

Malheureusement, l'enregistrement d'un espace de noms est nécessaire si le document HTML utilise une déclaration d'espace de noms dans la balise <html>-.

MSS · Answer

Cette fonction effectue le travail sans dépendance ni analyse DOM:

function getOgTags($html) { $pattern='/<\s*meta\s+property="og:([^"]+)"\s+content="([^"]*)/i'; if(preg_match_all($pattern, $html, $out)) return array_combine($out[1], $out[2]); return array(); }

code de test:

$x=' <title>php - Using domDocument, and parsing info, I would like to get the &#39;href&#39; contents of an &#39;a&#39; tag - Stack Overflow</title> <link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d"> <link rel="Apple-touch-icon image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/Apple-touch-icon.png?v=c78bd457575a"> <link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml"> <meta name="referrer" content="Origin" /> <meta property="og:type" content="website"/> <meta property="og:url" content="https://stackoverflow.com/questions/5278418/using-domdocument-and-parsing-info-i-would-like-to-get-the-href-contents-of"/> <meta property="og:image" itemprop="image primaryImageOfPage" content="https://cdn.sstatic.net/Sites/stackoverflow/img/Apple-touch-icon@2.png?v=73d79a89bded" /> <meta name="Twitter:card" content="summary"/> <meta name="Twitter:domain" content="stackoverflow.com"/> <meta name="Twitter:title" property="og:title" itemprop="title name" content="Using domDocument, and parsing info, I would like to get the &#39;href&#39; contents of an &#39;a&#39; tag" /> <meta name="Twitter:description" property="og:description" itemprop="description" content="Possible Duplicate: Regular expression for grabbing the href attribute of an A element This displays the what is between the a tag, but I would like a way to get the href contents as well. Is..." />'; echo '<pre>'; var_dump(getOgTags($x));

et vous obtenez:

array(3) { ["type"]=> string(7) "website" ["url"]=> string(119) "https://stackoverflow.com/questions/5278418/using-domdocument-and-parsing-info-i-would-like-to-get-the-href-contents-of" ["image"]=> string(85) "https://cdn.sstatic.net/Sites/stackoverflow/img/Apple-touch-icon@2.png?v=73d79a89bded" }