python - BeautifulSoup - Missing tag under tag -
so, want text "h1" tags. i'm using beutifulsoup, , works fine until there no "h1" tag in "article" tag, "'nonetype' object has no attribute 'contents' error. here code:
from bs4 import beautifulsoup page = "<article> <a href="http://something"> </a> (missing "h1") <a href="http://something"> </a> </article> <article> <a href="http://something"> </a> <a href="http://something"> <h1>something</h1> </a> </article> <article> <a href="http://something"> </a> <a href="http://something"> <h1>something</h1> </a> </article>" soup = beautifulsoup(page, "lxml") h1s = [] articles = soup.find_all("article") in range(1,len(articles)): h1s.append(articles[i].h1.contents)
those messages when check line h1 tag , without.
type(articles[0].h1) <type 'nonetype'> type(articles[1].h1) <class 'bs4.element.tag'>
you should loop on articles
, list, , use find_all()
method h1
inside a
tag , add text
h1s. seems want -
h1s = [] articles = soup.find_all("article") in articles: x in i.find_all('h1'): h1s.append(x.text)
Comments
Post a Comment