python - BeautifulSoup - Missing tag under tag -


so, want text "h1" tags. i'm using beutifulsoup, , works fine until there no "h1" tag in "article" tag, "'nonetype' object has no attribute 'contents' error. here code:

from bs4 import beautifulsoup  page =       "<article>     <a href="http://something">     </a>   (missing "h1")     <a href="http://something">     </a>     </article>     <article>     <a href="http://something">     </a>     <a href="http://something">        <h1>something</h1>     </a>     </article>     <article>     <a href="http://something">     </a>     <a href="http://something">        <h1>something</h1>    </a>    </article>"  soup = beautifulsoup(page, "lxml")  h1s = []  articles = soup.find_all("article")   in range(1,len(articles)):     h1s.append(articles[i].h1.contents) 

those messages when check line h1 tag , without.

type(articles[0].h1)  <type 'nonetype'> type(articles[1].h1) <class 'bs4.element.tag'> 

you should loop on articles , list, , use find_all() method h1 inside a tag , add text h1s. seems want -

h1s = [] articles = soup.find_all("article") in articles:     x in i.find_all('h1'):             h1s.append(x.text) 

Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -