python - Parsing: How do I strip out Unicode Characters? -

- July 15, 2014

i wrote code grab text in between break elements on webpage http://www.virginiaequestrian.com/main.cfm?action=greenpages&sub=view&id=10478

i think on right track right getting bad values below results [u'2133 craigs store road', u'afton,\r\n\t\tva \xa0\r\n\t\t22920', u'contact person:', u'email address:', u'website:', u'phone: 434-882-3150', u'']

i need figure out how strip out unicode result values. can help?

r=requests.get('http://www.virginiaequestrian.com/main.cfm?action=greenpages&sub=view&id=10478') soup=beautifulsoup(r.content,'lxml') tbl=soup.findall('table')[2]  contact=tbl.findall('p')[0]  list=[] br in contact.findall('br'):     next = br.nextsibling     text=next.strip()     list.append(text) print list

you can use replace built-in function str type has.

text = next.strip().replace("\n", "").replace("\t", "").replace("\r", "")

that way can replace \n\t\r , replace them nothing

Search This Blog

Post

python - Parsing: How do I strip out Unicode Characters? -

Comments

Post a Comment

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

Maven Javadoc 'Cannot find default setter' and fails -