|Reported by:||anonymous||Owned by:||rjollos|
You don't parenthesize your pipe-separated list of tokens, meaning the \b directives that you put at the beginning and end of the regexp only apply to the tokens at either end.
What you want is \b(abc|def|ghi|jkl)\b NOT \babc|def|ghi|jkl\b
-- or better yet \b(?:abc|def|ghi|jkl)\b
(note that the list will be sorted into some arbitrary order by python based upon it's hashing algorithm, the order you specify them in the EMOTICONS map is not relevant)
Basically, two of the emotes in your map will not be rendered, but which two is unpredictable.
With the above list, only the tokens 'def' and 'ghi' are successfully matched and rendered into emotes. abc and jkl are never matched and never rendered into emotes regardless of if they are surrounded by word boundaries or are substrings.
When I tried parenthesizing the expression (thus applying the \b directive to the beginning and end of EVERY token) I found that NOTHING is successfully matched.
Removing the \b directives appears to work just fine - all tokens are matched, and only if they are not substrings (in other words, there seems to be an implicit check for word boundaries anyway).
A quick check of the python tutorial shows that \b can also be considered the sequence for the backspace character. I don't (care to) know python very well, so I don't know if that's what you've done wrong, and I wasn't interested in trying to find out (whitespace sensitivity lol) so I'll leave the investigation into that possibility to you.
Without the \b directives, there is no longer any reason to parenthesize the expression either.
In short, your regexp seems mighty dodgy. This works just fine for me:
pattern = '|'.join([re.escape(pattern) for pattern in EMOTICONS])
yield pattern, _replace
Change History (4)
comment:2 in reply to: ↑ description Changed 2 years ago by rjollos
- Status changed from new to assigned
comment:3 Changed 2 years ago by rjollos
- Resolution set to fixed
- Status changed from assigned to closed