Comment utiliser lxml pour trouver un élément par texte?

Question

Supposons que nous ayons le code HTML suivant:

<html> <body> <a href="/1234.html">TEXT A</a> <a href="/3243.html">TEXT B</a> <a href="/7445.html">TEXT C</a> <body> </html>

Comment puis-je lui faire trouver l'élément "a", qui contient "TEXT A"?

Jusqu'à présent, j'ai:

root = lxml.hmtl.document_fromstring(the_html_above) e = root.find('.//a')

J'ai essayé:

e = root.find('.//a[@text="TEXT A"]')

mais cela n'a pas fonctionné, car les balises "a" n'ont pas d'attribut "texte".

Existe-t-il un moyen de résoudre ce problème d'une manière similaire à ce que j'ai essayé?

unutbu · Accepted Answer

Tu es très proche. Utilisez text()= plutôt que @text (qui indique un attribut).

e = root.xpath('.//a[text()="TEXT A"]')

Ou, si vous savez seulement que le texte contient "TEXTE A",

e = root.xpath('.//a[contains(text(),"TEXT A")]')

Ou, si vous savez seulement que le texte commence par "TEXT A",

e = root.xpath('.//a[starts-with(text(),"TEXT A")]')

Voir la documentation pour en savoir plus sur les fonctions de chaîne disponibles.

Par exemple,

import lxml.html as LH text = '''\ <html> <body> <a href="/1234.html">TEXT A</a> <a href="/3243.html">TEXT B</a> <a href="/7445.html">TEXT C</a> <body> </html>''' root = LH.fromstring(text) e = root.xpath('.//a[text()="TEXT A"]') print(e)

les rendements

[<Element a at 0xb746d2cc>]

ToonAlfrink · Answer

Une autre façon qui me semble plus simple:

results = [] root = lxml.hmtl.fromstring(the_html_above) for tag in root.iter(): if "TEXT A" in tag.text results.append(tag)