python - Regex to get consecutive capitalized words with one or more words doesn't work -
i'm trying consecutive capitalized words 1 or more looks doesn't work me.
def extract(string): return re.findall('([a-z][a-z]*(?=\s[a-z])(?:\s+[a-z][a-z]*)*)', string)
here's test case
def test_extract_capitalize_words(self): keywords = extract('this new york , london') self.assertequals(['new york', 'london'], keywords)
it captures new york
, not london
this match consecutive captitalized word or capitalized word followed end of line boundary.
>>> import re >>> s = 'this new york , london' >>> re.findall(r'\b[a-z][a-z]*\b(?:(?:\s+[a-z][a-z]*\b)+|$)', s) ['new york', 'london']
Comments
Post a Comment