python - Beautifulsoup find element by text using `find_all` no matter if there are elements in it -


for example

bs = beautifulsoup("<html><a>sometext</a></html>") print bs.find_all("a",text=re.compile(r"some")) 

returns [<a>sometext</a>] when element searched has child, i.e. img

bs = beautifulsoup("<html><a>sometext<img /></a></html>") print bs.find_all("a",text=re.compile(r"some")) 

it returns []

is there way use find_all match later example?

you need use hybrid approach since text= fail when element has child elements text.

bs = beautifulsoup("<html><a>sometext</a></html>")     reg = re.compile(r'some') elements = [e e in bs.find_all('a') if reg.match(e.text)] 

background

when beautifulsoup searching element, , text callable, eventually calls:

self._matches(found.string, self.text) 

in 2 examples gave, .string method returns different things:

>>> bs1 = beautifulsoup("<html><a>sometext</a></html>") >>> bs1.find('a').string u'sometext' >>> bs2 = beautifulsoup("<html><a>sometext<img /></a></html>") >>> bs2.find('a').string >>> print bs2.find('a').string none 

the .string method looks this:

@property def string(self):     """convenience property single string within tag.      :return: if tag has single string child, return value      string. if tag has no children, or more 1      child, return value none. if tag has 1 child tag,      return value 'string' attribute of child tag,      recursively.     """     if len(self.contents) != 1:         return none     child = self.contents[0]     if isinstance(child, navigablestring):         return child     return child.string 

if print out contents can see why returns none:

>>> print bs1.find('a').contents [u'sometext'] >>> print bs2.find('a').contents [u'sometext', <img/>] 

Comments

Popular posts from this blog

node.js - Bad Request - node js ajax post -

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -