Python: BeautifulSoup - Obtient une valeur d'attribut basée sur l'attribut name

Question

Je veux imprimer une valeur d'attribut basée sur son nom, prenons par exemple

<META NAME="City" content="Austin">

Je veux faire quelque chose comme ça

soup = BeautifulSoup(f) //f is some HTML containing the above meta tag for meta_tag in soup('meta'): if meta_tag['name'] == 'City': print meta_tag['content']

Le code ci-dessus donne un KeyError: 'name'. Je pense que c'est parce que le nom est utilisé par BeatifulSoup et qu'il ne peut donc pas être utilisé comme argument de mot-clé.

theharshest · Accepted Answer

C'est assez simple, utilisez ce qui suit -

>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('<META NAME="City" content="Austin">') >>> soup.find("meta", {"name":"City"}) <meta name="City" content="Austin" /> >>> soup.find("meta", {"name":"City"})['content'] u'Austin'

Laissez un commentaire si quelque chose n'est pas clair.

Delicious · Answer

theharshest répondit à la question mais voici une autre façon de faire la même chose . En outre, dans votre exemple, vous avez NAME en majuscule et dans votre code, vous avez nom en minuscule.

s = '<div class="question" id="get attrs" name="python" x="something">Hello World</div>' soup = BeautifulSoup(s) attributes_dictionary = soup.find('div').attrs print attributes_dictionary # prints: {'id': 'get attrs', 'x': 'something', 'class': ['question'], 'name': 'python'} print attributes_dictionary['class'][0] # prints: question print soup.find('div').get_text() # prints: Hello World

Leonard Richardson · Answer

la réponse de theharshest est la meilleure solution, mais le problème que vous avez rencontré concerne le fait qu'un objet Tag dans Beautiful Soup agit comme un dictionnaire Python. Si vous accédez à la balise ['name'] sur une balise qui n'a pas d'attribut 'name', vous obtiendrez une KeyError.

BrightMoon · Answer

Les oeuvres suivantes:

from bs4 import BeautifulSoup soup = BeautifulSoup('<META NAME="City" content="Austin">', 'html.parser') metas = soup.find_all("meta") for meta in metas: print meta.attrs['content'], meta.attrs['name']

ron g · Answer

6 ans de retard pour la soirée mais je cherche comment extraire une balise d'élément html _ valeur d'attribut, donc pour:

<span property="addressLocality">Ayr</span>

Je veux "addressLocality". Je n'arrêtais pas de revenir ici, mais les réponses ne résolvaient pas vraiment mon problème.

Comment j'ai réussi à le faire finalement:

>>> from bs4 import BeautifulSoup as bs >>> soup = bs('<span property="addressLocality">Ayr</span>', 'html.parser') >>> my_attributes = soup.find().attrs >>> my_attributes {u'property': u'addressLocality'}

Comme c'est un dict, vous pouvez aussi utiliser keys et 'valeurs'

>>> my_attributes.keys() [u'property'] >>> my_attributes.values() [u'addressLocality']

Espérons que cela aide quelqu'un d'autre!

Ujjaval Moradiya · Answer

On peut aussi essayer cette solution:

Pour trouver la valeur, qui est écrite en bout de tableau

htmlContent

<table> <tr> <th> ID </th> <th> Name </th> </tr> <tr> <td> <span name="spanId" class="spanclass">ID123</span> </td> <td> <span>Bonny</span> </td> </tr> </table>

Code Python

soup = BeautifulSoup(htmlContent, "lxml") soup.prettify() tables = soup.find_all("table") for table in tables: storeValueRows = table.find_all("tr") thValue = storeValueRows[0].find_all("th")[0].string if (thValue == "ID"): # with this condition I am verifying that this html is correct, that I wanted. value = storeValueRows[1].find_all("span")[0].string value = value.strip() # storeValueRows[1] will represent <tr> tag of table located at first index and find_all("span")[0] will give me <span> tag and '.string' will give me value # value.strip() - will remove space from start and end of the string. # find using attribute : value = storeValueRows[1].find("span", {"name":"spanId"})['class'] print value # this will print spanclass