python - Regex to get consecutive capitalized words with one or more words doesn't work -


i'm trying consecutive capitalized words 1 or more looks doesn't work me.

def extract(string):     return re.findall('([a-z][a-z]*(?=\s[a-z])(?:\s+[a-z][a-z]*)*)', string) 

here's test case

def test_extract_capitalize_words(self):     keywords = extract('this new york , london')     self.assertequals(['new york', 'london'], keywords) 

it captures new york , not london

this match consecutive captitalized word or capitalized word followed end of line boundary.

>>> import re >>> s = 'this new york , london' >>> re.findall(r'\b[a-z][a-z]*\b(?:(?:\s+[a-z][a-z]*\b)+|$)', s) ['new york', 'london'] 

Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -