html - Locating tags via styles - using Python 2 and BeautifulSoup 4 -
i trying use beautifulsoup 4 extract text specific tags in html document. have html has bunch of div tags following:
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:42px; top:90px; width:195px; height:24px;"> <span style="font-family: fipxqm+arial-boldmt; font-size:12px"> futures daily market report financial gas <br/> 21-jul-2015 <br/> </span> </div> <div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:54px; top:135px; width:46px; height:10px;"> <span style="font-family: fipxqm+arial-boldmt; font-size:10px"> commodity <br/> </span> </div>
i trying text span tags in div tag has style of "left:54px".
i can single div if use:
soup = beautifulsoup(open(extracted_html_file)) print soup.find_all('div',attrs={"style":"position:absolute; border: textbox 1px solid; " "writing-mode:lr-tb; left:42px; top:90px; " "width:195px; height:24px;"})
it returns:
[<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:42px; top:90px; width:195px; height:24px;"><span style="font-family: fipxqm+arial-boldmt; font-size:12px">futures daily market report financial gas <br/>21-jul-2015 <br/></span></div>]
but gets me 1 div matches styling. want divs match "left:54px" style.
to this, i've tried few different ways:
soup = beautifulsoup(open(extracted_html_file)) print soup.find_all('div',style='left:54px') print soup.find_all('div',attrs={"style":"left:54px"}) print soup.find_all('div',attrs={"left":"54px"})
but these print statements return empty lists.
any ideas?
you can pass in regular expression instead of string according documentation here: http://www.crummy.com/software/beautifulsoup/bs4/doc/#the-keyword-arguments
so try this:
import re soup = beautifulsoup(open(extracted_html_file)) soup.find_all('div', style = re.compile('left:54px'))
Comments
Post a Comment