python - How to select all URLs in a web site excluding those of a given class? -


i select urls twitter followers page using regex. if use https://twitter\.com/.* select urls matching pattern in website, i'd exclude users on follow section. urls within whotofollow class. so, question is: can use xpath, regex or combination of both select urls matching previous pattern excluding urls within whotofollow class in python? thanks!

dani

if correctly understood, can use such xpath, taking a tag not class whotofollow , having url beginning https://twitter.com/. takes content of href

//a[not(@class="whotofollow") , starts-with(@href, "https://twitter.com/")]/@href 

Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -