python - How to select all URLs in a web site excluding those of a given class? -
i select urls twitter followers page using regex. if use https://twitter\.com/.*
select urls matching pattern in website, i'd exclude users on follow section. urls within whotofollow class. so, question is: can use xpath, regex or combination of both select urls matching previous pattern excluding urls within whotofollow class in python? thanks!
dani
if correctly understood, can use such xpath, taking a tag
not class whotofollow
, having url beginning https://twitter.com/
. takes content of href
//a[not(@class="whotofollow") , starts-with(@href, "https://twitter.com/")]/@href
Comments
Post a Comment