#11622 closed defect (fixed)
users can post or comment on trac-hacks.org in the name of any registered user
Reported by: | falkb | Owned by: | Ryan J Ollos |
---|---|---|---|
Priority: | high | Component: | AccountManagerPlugin |
Severity: | critical | Keywords: | |
Cc: | Ryan J Ollos, Steffen Hoffmann, lkraav | Trac Release: |
Description
I noticed I can comment on tickets here on trac-hacks.org as falkb without being asked for my password.
I have no problem with anonymous users posting or commenting with fictive names or email addresses. I think this is even a useful feature for people who are don't forced to register when they want to report something.
But I think if one tries to post or comment with a registered account name or registered email address, such action has to be acknowledged by a password barrier. This way one may still post with fictive names (fine with me), but is not allowed to post in the name of a known user without confirmation.
I think this is important to prevent account abuse.
Attachments (5)
Change History (61)
comment:1 Changed 11 years ago by
comment:3 Changed 11 years ago by
Replying to falkb:
Let's see if this works...
That was me. It obviously works. However, since the same works on teo, too I think this is a generic Trac issue rather than one specific to trac-hacks.org...
comment:4 follow-up: 6 Changed 11 years ago by
The issue has been known on t.e.o for a long time. I saw one instance of someone pretending to be ecarter in a ticket a long time ago. There have also been some ticket comments and page edits using usernames that are active on t-h.o, such as myself or falkb. The edits were obviously spam in those cases.
I think we can eventually implement a username check, either in AccountManagerPlugin or TracHacksPlugin.
comment:5 Changed 11 years ago by
Hm, I'm worried to see t:#1890, that exactly reflects the current state: unchanged source regarding this issue for 8 years by now and not even a comment within the last 3 years.
comment:6 Changed 11 years ago by
Replying to rjollos:
The issue has been known on t.e.o for a long time. ...
I do remember some discussion on the issue myself, yes.
I think we can eventually implement a username check, either in AccountManagerPlugin or TracHacksPlugin.
Good point. In the same notion as we prevent changes to invalid email addresses with email validation enabled we could enforce some sort of username clearance too. I'll start to work on this right after the TagsPlugin release, unless someone beats me with an earlier patch attached to an enhancement request for AccountManagerPlugin.
comment:7 follow-up: 8 Changed 11 years ago by
After reading through trac:#1890, I'm left feeling that all of the proposed solutions will require quite a bit of effort, although a good solution seems to be outlined in trac:comment:53:ticket:1890 and trac:comment:94:ticket:1890.
A reasonably quick solution for trac-hacks.org would be to implement ITicketManipulator
and IWikiManipulator
: if req.authname == 'anonymous'
, then check if author
matches a sid
in the session
table with authenticated = 1
. If a match is found, reject the change and tell the user they must login in order to save a change with that username.
A minority of users may not like the requirement to login, but I think it's reasonable to require a login if you've previously registered an account.
comment:8 follow-up: 9 Changed 11 years ago by
Replying to rjollos:
A reasonably quick solution for trac-hacks.org would be to implement
ITicketManipulator
andIWikiManipulator
: ifreq.authname == 'anonymous'
, then check ifauthor
matches asid
in thesession
table withauthenticated = 1
. If a match is found, reject the change and tell the user they must login in order to save a change with that username.
I'd go one step further and disallow to even change an anonymous session id to a registered username. While this would be easier done in Trac core, we can do so inside AcctMgr too.
A minority of users may not like the requirement to login, but I think it's reasonable to require a login if you've previously registered an account.
Absolutely reasonable for the sake of clarity, yes.
comment:9 follow-up: 10 Changed 11 years ago by
Replying to hasienda:
I'd go one step further and disallow to even change an anonymous session id to a registered username.
So we just check if the author
matches a sid
in the session
table and don't require authenticated = 1
? I think that would work since the sid
s are hashes in the case of an unauthenticated session.
While this would be easier done in Trac core
If there's an easier fix to the Trac core, I think the issue would be worth pushing there. However we fix this on t-h.o, they'll probably want to fix it on t.e.o as well.
comment:10 Changed 11 years ago by
Replying to rjollos:
Replying to hasienda:
I'd go one step further and disallow to even change an anonymous session id to a registered username.
So we just check if the
author
matches asid
in thesession
table and don't requireauthenticated = 1
? I think that would work since thesid
s are hashes in the case of an unauthenticated session.
Yes, that's the plan, and ideally we'll do so in Trac core code for setting non-random, custom session id's, but I think that at least meanwhile we can intercept on a higher level, or at a different end, i.e. by using a specially crafted authenticator to instantly stop such ambiguous sessions from proceeding.
Changed 11 years ago by
Attachment: | 20140316_prevent-ambiguous-sid.patch added |
---|
proposed changes for preventing ambiguous anonymous session IDs using AccountManagerPlugin
comment:12 follow-up: 20 Changed 11 years ago by
Proposed changes do not alter existing IDs but prevent everyone from choosing an existing authenticated session ID now and forever.
Hint: Any anonymous session ID that is identical to an authenticated session ID is deleted automatically next time an anonymous session is promoted to authenticated session ID of the same name (see Session.promote_session()
in t:source:trunk/trac/web/session.py for details).
comment:13 Changed 11 years ago by
Instead of "Choose another session ID, please."
I'd rather say "Choose another session ID or login, please."
Changed 11 years ago by
Attachment: | 20140317_prevent-ambiguous-sid.patch added |
---|
2nd version including improved warning message according to falkb's recommendation
comment:15 follow-up: 18 Changed 11 years ago by
More suggestions welcome. Or should it have been that easy? ;-)
comment:16 Changed 11 years ago by
fine with me if it simply prevents to post in the name of registered others. I wonder why there was so much discussion in t:#1890 over so many years.
comment:17 Changed 11 years ago by
Cc: | lkraav added |
---|
comment:18 follow-up: 19 Changed 11 years ago by
Replying to hasienda:
More suggestions welcome. Or should it have been that easy? ;-)
just do activate the patch! We'll see whether it helps or not
comment:19 Changed 11 years ago by
Replying to paul@…:
Replying to hasienda:
More suggestions welcome. Or should it have been that easy? ;-)
just do activate the patch! We'll see whether it helps or not
One step at a time. We don't want to introduce bugs by careless action. Even more since discussion on the corresponding ticket for Trac core went for years by now, so I would rather hear a bit more feedback before applying the patch to AccountManagerPlugin's trunk
.
Afterwards we could roll it to this Trac instance, if a) I release another minor version of current stable branch of the plugin, effectively forwarding the change to every updated plugin installation, or b) someone patches the plugin source exclusively for the installation here.
Even with broad approval there is still no solution for Trac core. Many Trac application including t.e.o do not utilize AccountManagerPlugin, that we plan to alter here. And the currently suggested approach itself is generally flawed, because it relies on checking the Trac db table session
. Partially repeating comments from the other ticket here for the list of shortcomings: There is not yet a way of
- detecting authorized users, that did not log-in to Trac before
- preventing irritation by username-alike session IDs (without knowing the full user list)
- preventing choice of session IDs of accounts created in the future (likely a 'cantfix', but worth to look into for another pre-registration sanity check for AccountManager's registration procedure)
comment:20 follow-ups: 21 26 Changed 11 years ago by
Replying to hasienda:
Proposed changes do not alter existing IDs but prevent everyone from choosing an existing authenticated session ID now and forever.
I tested out the patch. It does prevent a user from populating the name
attribute (SESSION_ATTRIBUTE.name
) of an anonymous session with the sid
of an authenticated user.
However, it doesn't prevent users from entering the sid
of an authenticated user in the reporter field when creating tickets, in the author field when commenting on tickets and in the author field when editing the wiki. It seems that the latter is the more serious issue that needs to be addressed.
Regarding the code, this line confused me a bit:
not (req.authname and req.authname != 'anonymous')
The intention is more clear to me with:
(not req.authname or req.authname == 'anonymous')
comment:21 follow-up: 22 Changed 11 years ago by
Replying to rjollos:
However, it doesn't prevent users from entering the
sid
of an authenticated user in the reporter field when creating tickets, in the author field when commenting on tickets and in the author field when editing the wiki. It seems that the latter is the more serious issue that needs to be addressed.
Sorry, I forgot about these places, because I generally do not allow such anonymous contributions in Trac applications I manage and always care to log-in myself before acting anywhere else too. This needs to get addressed by using the manipulator interfaces indeed.
Regarding the code, this line confused me a bit:
not (req.authname and req.authname != 'anonymous')The intention is more clear to me with:
(not req.authname or req.authname == 'anonymous')
While the expressions are logically equivalent NOT(a AND b) == NOT(a) OR NOT(b), there is a subtle difference, that let python programmers prefer the first term: First and-ed statement will skip further evaluation of req.authname, if the value is empty or None), the 2nd or-ed statement needs to always check both parts.
comment:22 follow-up: 23 Changed 11 years ago by
Replying to hasienda:
there is a subtle difference, that let python programmers prefer the first term: First and-ed statement will skip further evaluation of req.authname, if the value is empty or None), the 2nd or-ed statement needs to always check both parts.
More specifically I would say, a python programmer would saw the need for the optimization would prefer the first term. It is no matter though, as it may only be me that tends to get confused by the logic, and anything confusing can always be clarified with a comment.
comment:23 follow-up: 25 Changed 11 years ago by
Replying to rjollos:
More specifically I would say, a python programmer would saw the need for the optimization would prefer the first term. It is no matter though, as it may only be me that tends to get confused by the logic, and anything confusing can always be clarified with a comment.
I've seen the same pattern in Trac code several times and got confused about the logic as you did, even replaced it. From that experience I tell you, that it is not always about optimization. Instead of req.authname
think of a method, that could return a list. We always want to look at the first list element, but in some cases the list might as well be empty. Here the first statement really shines, as it prevents an IndexError: list index out of range
for the empty list, so it is in general a safer expression - nice defensive design example IMHO.
comment:24 follow-up: 31 Changed 11 years ago by
Not to turn this into a programming theory discussion, but should Yoda conditions be also generally used in Python? Not sure if so much applicable to the AND version, but definitely for the OR version.
not (req.authname and 'anonymous' != req.authname) (not req.authname or 'anonymous' == req.authname)
comment:25 Changed 11 years ago by
Replying to hasienda:
Regarding the code, this line confused me a bit:
not (req.authname and req.authname != 'anonymous')The intention is more clear to me with:
(not req.authname or req.authname == 'anonymous')While the expressions are logically equivalent NOT(a AND b) == NOT(a) OR NOT(b), there is a subtle difference, that let python programmers prefer the first term: First and-ed statement will skip further evaluation of req.authname, if the value is empty or None), the 2nd or-ed statement needs to always check both parts.
The 2nd or-ed statement also skips the req.authname == 'anonymous'
part if not req.authname
. The or
operator gets short-circuited just like and
.
Replying to hasienda:
Instead of
req.authname
think of a method, that could return a list. We always want to look at the first list element, but in some cases the list might as well be empty. Here the first statement really shines, as it prevents anIndexError: list index out of range
for the empty list, so it is in general a safer expression - nice defensive design example IMHO.
>>> list = [] >>> not(list and list[0] != 'anonymous') True >>> (not list or list[0] == 'anonymous') True
comment:26 Changed 11 years ago by
Replying to rjollos:
However, it doesn't prevent users from entering the
sid
of an authenticated user in the reporter field when creating tickets, in the author field when commenting on tickets and in the author field when editing the wiki. It seems that the latter is the more serious issue that needs to be addressed.
Updated patch addresses all but the initial reporter field. I've just overlooked that, sorry.
Changed 11 years ago by
Attachment: | 20140421_prevent-ambiguous-sid.patch added |
---|
3rd version including resource manipulator interface implementations as well as hard request deflection for other cases
comment:27 Changed 11 years ago by
Right, the bogus anonymous reporter was handled as a generic case. Not as graceful as I wanted, but still a proof for other author input not handled by manipulators yet.
Just updated the patch in-place, because the additional change was only to include 'newticket' as another exception from the generic case. So it shall be complete now. More comments?
comment:28 follow-up: 29 Changed 11 years ago by
I'll test and get you feedback within the next day or so.
Are you considering creating a 0.4.5 release that includes this change? That would allow us to deploy this soon and not depend on a patch instance of AccountManagerPlugin.
comment:29 Changed 11 years ago by
Replying to rjollos:
I'll test and get you feedback within the next day or so.
Would be great, thanks.
Are you considering creating a 0.4.5 release that includes this change? That would allow us to deploy this soon and not depend on a patch instance of AccountManagerPlugin.
I didn't think about it, but given the potential of confusion caused by current behavior I would be willing to push it, this way or another (finalize acct_mgr-0.5
).
comment:31 Changed 11 years ago by
Replying to lkraav:
Not to turn this into a programming theory discussion, but should Yoda conditions be also generally used in Python?
Steffen captured this on the DevGuide#CodingStyle page. It could be discussed on the TracDev mailing list to see if this is something we'd want to adopt in t:TracDev/CodingStyle.
From the cases I've considered thus far, there is stronger justification for yoda conditions in C. In Python, the following is a syntax error:
if a = 3: print 'yes'
However, the following case would justify their use:
# suppose you intend this b = a == some_constant # but instead you type this, which is not a syntax error b = a = some_constant # on the other hand, the "yoda condition" form would be a syntax error b = some_constant = a # however, using parenthesis would result in a syntax error as well b = (a = some_constant)
comment:32 follow-ups: 33 34 Changed 11 years ago by
About the patches, I think it is easy to escape the checking of author's name using Unicode zero-width space (U+200B), Right-to-left mark (U+200F), etc.
comment:33 Changed 11 years ago by
Replying to jun66j5:
About the patches, I think it is easy to escape the checking of author's name using Unicode
zero-width space (U+200B), Right-to-left mark (U+200F), etc.
Thanks for picking up on this, but you intention is not completely clear to me yet. Do you mean to encourage me to improve towards including Unicode white space, or do you feel that the whole approach is flawed somehow? I vaguely remember checking for such flag to no avail when developing for this feature. Any pointer on where to head from here would be appreciated.
comment:34 Changed 11 years ago by
Replying to jun66j5:
About the patches, I think it is easy to escape the checking of author's name using Unicode zero-width space (U+200B), Right-to-left mark (U+200F), etc.
That is a good point. I can reproduce by pasting a few ZWSP characters into the author field before typing the name of an authenticated author. We could use stripws
from the trac core, which is available since 1.0: trac:browser:/tags/trac-1.0/trac/util/text.py@:101#L98. For AccountManagerPlugin we'll need to copy the function to the compat
module.
Btw, I had been wondering for some time why ZWSP had to be explicitly included in the regex along with \s
. A possible explanation is: SO:8928710/121694.
Everything else with the patch looks good to me. Revised patch: 20140616_prevent-ambiguous-sid.patch.
It would be great to apply this to the AccountManagerPlugin and roll out a new version that we can use on t-h.o. Thanks for your efforts to implement this, and I apologize for the delay in my reply.
Changed 11 years ago by
Attachment: | 20140616_prevent-ambiguous-sid.patch added |
---|
comment:35 Changed 11 years ago by
I think we should remove the zero-width characters from entire of the author rahter than strip from trailing and leading of the author.
Code point | Bytes | Example | Notes |
---|---|---|---|
U+180E | "jun" U+180E "66j5" | jun66j5 | Maybe requires Mongolian Font |
U+200B | "jun" U+200B "66j5" | jun66j5 | |
U+200C | "jun" U+200C "66j5" | jun66j5 | |
U+200D | "jun" U+200D "66j5" | jun66j5 | |
U+2060 | "jun" U+2060 "66j5" | jun66j5 | |
U+FEFF | "jun" U+FEFF "66j5" | jun66j5 |
See also https://www.cs.tut.fi/~jkorpela/chars/spaces.html and http://en.wikipedia.org/wiki/Space_%28punctuation%29#Spaces_in_Unicode.
Also, Directional Formatting Characters can be used to bypass the checking.
Code point | Bytes | Example |
---|---|---|
U+200E | "jun" U+200E "66j5" | jun66j5 |
U+200F | "jun" U+200F "66j5" | jun66j5 |
http://www.unicode.org/reports/tr9/#Directional_Formatting_Characters.
comment:36 Changed 11 years ago by
Removing all zero-width characters from the string seems fairly straightforward. Do you think you've documented all of the zero-width characters, or do we need to do some additional searching? The list of all possible unicode characters is quite overwhelming.
comment:37 follow-up: 38 Changed 11 years ago by
I cannot find the documentation about ZWSPs. Instead, found FAQ in unicode.org, Q: Which characters should be displayed as invisible, if not supported?.
According to the answer, the following list is all of ZWSP characters.
Category | Code point | Example | |
---|---|---|---|
cursive joiners | U+200C | ab | |
cursive joiners | U+200D | ab | |
bidirectional format controls | U+061C | ab | |
bidirectional format controls | U+200E | ab | |
bidirectional format controls | U+200F | ab | |
bidirectional format controls | U+202A | ab | |
bidirectional format controls | U+202B | ab | |
bidirectional format controls | U+202C | ab | |
bidirectional format controls | U+202D | ab | |
bidirectional format controls | U+202E | ab | |
bidirectional format controls | U+2066 | ab | |
bidirectional format controls | U+2067 | ab | |
bidirectional format controls | U+2068 | ab | |
bidirectional format controls | U+2069 | ab | |
the soft hyphen | U+00AD | ab | |
word joiners | U+2060 | ab | |
word joiners | U+FEFF | ab | |
the zero width space | U+200B | ab | |
invisible math operators | U+2061 | ab | |
invisible math operators | U+2062 | ab | |
invisible math operators | U+2063 | ab | |
invisible math operators | U+2064 | ab | |
Jamo filler characters | U+115F | aᅟb | |
Jamo filler characters | U+1160 | aᅠb | |
variation selectors | U+180B | a᠋b | |
variation selectors | U+180C | a᠌b | |
variation selectors | U+180D | a᠍b | |
variation selectors | U+FE00 – U+FE0F | a︀b | a️b |
variation selectors | U+E0100 – U+E01EF | a󠄀b | a󠇯b |
comment:38 Changed 10 years ago by
Replying to jun66j5:
I cannot find the documentation about ZWSPs. Instead, found FAQ in unicode.org, Q: Which characters should be displayed as invisible, if not supported?.
According to the answer, the following list is all of ZWSP characters.
Thanks, I'll make a modified version of stripws, and after the patch is complete we can push the change back to the Trac core if the modified function is generally useful. If anyone else wants to take the lead on revising the patch, please fell free.
comment:39 Changed 10 years ago by
Component: | TracHacks → AccountManagerPlugin |
---|---|
Owner: | changed from Michael Renzmann to Steffen Hoffmann |
Moving this to AccountManagerPlugin since the patch will be applied there.
comment:40 Changed 10 years ago by
I've posted this comment as falkb again without being asked for my password. :-(
comment:42 Changed 10 years ago by
Sorry, my comment was not meant rude. I just see something has happened here in the meanwhile (which could be partial commits on the way to the 'fixed' state) but currently I don't have time to read it all. So it was just a very fast contribution of my user test, with the idea of better testing too much than to less. Friendly greetings to you all :-)
comment:43 follow-up: 44 Changed 10 years ago by
I see. Well I apologize and should not have assumed you were just nagging us ;)
The patch needs to be revised per comment:32. It's not currently my highest priority, but I would like to get it fixed for AccountManagerPlugin release 0.5.
comment:44 Changed 10 years ago by
Replying to rjollos:
...
I would like to get it fixed for AccountManagerPlugin release 0.5.
Great. I did not look into Jun's contribution so far as well, but this should really go into next release, given the rather big impact on (anonymous) request handling.
comment:45 Changed 10 years ago by
Owner: | changed from Steffen Hoffmann to Ryan J Ollos |
---|---|
Status: | new → accepted |
comment:47 Changed 8 years ago by
This ticket was long overdue.
I tried including the variation selectors in the regex, but the tests fail. Do the code points outside plane 0 need to be specified in a different way? Anyway, I don't think that minor point is too important.
-
acct_mgr/tests/util.py
47 47 def test_remove_zwsp(self): 48 48 self.assertEqual(u'user', remove_zwsp(u'\u200buser\u200b')) 49 49 self.assertEqual(u'user', remove_zwsp(u'\u200fu\ufe00ser\u061c')) 50 self.assertEqual(u'user', remove_zwsp(u'fu\ue0100ser')) 50 51 51 52 52 53 def test_suite(): -
acct_mgr/util.py
129 129 130 130 _zwsp_re = re.compile(u'[\\s\u200b-\u200f\u061c\u202a-\u202e' 131 131 u'\u2066-\u2069\u00ad\u2060\ufeff\u2061-\u2064' 132 u'\u115f\u1160\u180b-\u180d\ufe00-\ufe0f ]+',133 re.UNICODE)132 u'\u115f\u1160\u180b-\u180d\ufe00-\ufe0f' 133 u'\ue0100-\ue01ef]+', re.UNICODE) 134 134 135 135 136 136 def remove_zwsp(text):
$ python -m unittest acct_mgr.tests.util .F ====================================================================== FAIL: test_remove_zwsp (acct_mgr.tests.util.UtilTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "acct_mgr/tests/util.py", line 48, in test_remove_zwsp self.assertEqual(u'user', remove_zwsp(u'\u200buser\u200b')) AssertionError: u'user' != u'' - user + ---------------------------------------------------------------------- Ran 2 tests in 0.001s FAILED (failures=1)
I'll deploy changes to trac-hacks.org within a few days and make a blog post.
If anyone complains about the new behavior running TracAccountManager in their Trac instance, we can move the manipulators and request filter into a component that can be disabled.
comment:48 Changed 8 years ago by
The u'fu\ue0100ser'
should be u'fu\U000e0100ser'
.
Linux:
Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> [hex(ord(c)) for c in u'fu\ue0100ser'] ['0x66', '0x75', '0xe010', '0x30', '0x73', '0x65', '0x72'] >>> [hex(ord(c)) for c in u'fu\U000e0100ser'] ['0x66', '0x75', '0xe0100', '0x73', '0x65', '0x72']
Windows:
Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> [hex(ord(c)) for c in u'fu\U000e0100ser'] ['0x66', '0x75', '0xdb40', '0xdd00', '0x73', '0x65', '0x72']
If Python is narrow build, U+10000 and later characters cannot be used in regular expression. We should use surrogate pair.
>>> import re >>> re.compile(u'[\U000e0100-\U000e01ef]+', re.UNICODE) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\usr\apps\python27-amd64\lib\re.py", line 194, in compile return _compile(pattern, flags) File "C:\usr\apps\python27-amd64\lib\re.py", line 251, in _compile raise error, v # invalid expression sre_constants.error: bad character range >>> re.compile(u'\udb40[\udd00-\uddef]+', re.UNICODE) <_sre.SRE_Pattern object at 0x0000000002806DD8>
comment:51 Changed 8 years ago by
Changes deployed to trac-hacks.org. Please test and let me know if there are any issues. Changes described in blog:2017-06-15-username-policy.
Some other ideas to consider:
- Don't let user comment with a registered email address in the author field
- Decorate changes made by unauthenticated users with a subtle indicator
Changed 8 years ago by
Attachment: | accounthack-smoketest.PNG added |
---|
comment:53 follow-up: 54 Changed 8 years ago by
In [16663], u'[\udb40[\udd00-\uddef]+
is wired. That should be (?:\udb40[\udd00-\uddef])+
.
It is not needed to use surrogate pairs in unicode literal. Python interpreter automatically convert from UCS-4 character to surrogate pair with narrow build.
Linux (wide build):
Python 2.5.6 (r256:88840, Oct 21 2014, 22:49:55) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> hex(sys.maxunicode) '0x10ffff' >>> parse_arg_list('text=%F3%A0%84%80-%F3%A0%87%AF') [(u'text', u'\U000e0100-\U000e01ef')] >>> [hex(ord(c)) for c in parse_arg_list('text=%F3%A0%84%80-%F3%A0%87%AF')[0][1]] ['0xe0100', '0x2d', '0xe01ef'] >>> import re >>> re.compile(u'[\U000e0100-\U000e01ef]+', re.U) <_sre.SRE_Pattern object at 0x20cbf30>
Windows (narrow build):
Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> hex(sys.maxunicode) '0xffff' >>> from trac.web.api import parse_arg_list >>> parse_arg_list('text=%F3%A0%84%80-%F3%A0%87%AF') [(u'text', u'\U000e0100-\U000e01ef')] >>> [hex(ord(c)) for c in parse_arg_list('text=%F3%A0%84%80-%F3%A0%87%AF')[0][1]] ['0xdb40', '0xdd00', '0x2d', '0xdb40', '0xddef'] >>> import re >>> re.compile(u'[\U000e0100-\U000e01ef]+', re.U) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\usr\src\trac\venv\trac-1.0.12\lib\re.py", line 194, in compile return _compile(pattern, flags) File "C:\usr\src\trac\venv\trac-1.0.12\lib\re.py", line 251, in _compile raise error, v # invalid expression sre_constants.error: bad character range
-
accountmanagerplugin/trunk/acct_mgr/tests/util.py
diff --git a/accountmanagerplugin/trunk/acct_mgr/tests/util.py b/accountmanagerplugin/trunk/acct_mgr/tests/util.py index ce89355c5..5d0ab0f69 100644
a b class UtilTestCase(unittest.TestCase): 48 48 self.assertEqual(u'user', remove_zwsp(u'user')) 49 49 self.assertEqual(u'user', remove_zwsp(u'\u200buser\u200b')) 50 50 self.assertEqual(u'user', remove_zwsp(u'\u200fu\ufe00ser\u061c')) 51 self.assertEqual(u'user', remove_zwsp(u'u\udb40\udd00ser')) 51 self.assertEqual(u'u\U000e00ffser', remove_zwsp(u'u\U000e00ffser')) 52 self.assertEqual(u'user', remove_zwsp(u'u\U000e0100ser')) 53 self.assertEqual(u'user', remove_zwsp(u'u\U000e01efser')) 54 self.assertEqual(u'u\U000e01f0ser', remove_zwsp(u'u\U000e01f0ser')) 52 55 53 56 54 57 def test_suite(): -
accountmanagerplugin/trunk/acct_mgr/util.py
diff --git a/accountmanagerplugin/trunk/acct_mgr/util.py b/accountmanagerplugin/trunk/acct_mgr/util.py index f57a84e1e..3e0a702a4 100644
a b def pretty_precise_timedelta(time1, time2=None, resolution=None, diff=0): 127 127 % (str(t) != '0' and t or '')).rstrip() 128 128 129 129 130 _zwsp_re = re.compile(u'([\\s\u200b-\u200f\u061c\u202a-\u202e' 131 u'\u2066-\u2069\u00ad\u2060\ufeff\u2061-\u2064' 132 u'\u115f\u1160\u180b-\u180d\ufe00-\ufe0f]+|' 133 u'[\udb40[\udd00-\uddef]+)', 134 re.UNICODE) 130 def _create_zwsp_re(): 131 ucs2_range = u'\\s\u200b-\u200f\u061c\u202a-\u202e\u2066-\u2069\u00ad' \ 132 u'\u2060\ufeff\u2061-\u2064\u115f\u1160\u180b-\u180d' \ 133 u'\ufe00-\ufe0f' 134 try: 135 pattern = re.compile(u'[%s\U000e0100-\U000e01ef]+' % ucs2_range, 136 re.UNICODE) 137 except re.error: 138 # Narrow build, `re` cannot use characters >= 0x10000 139 pattern = re.compile(u'[%s]+|(?:\udb40[\udd00-\uddef])+' % ucs2_range, 140 re.UNICODE) 141 return pattern 142 143 144 _zwsp_re = _create_zwsp_re() 135 145 136 146 137 147 def remove_zwsp(text):
comment:54 Changed 8 years ago by
Replying to Jun Omae:
In [16663],
u'[\udb40[\udd00-\uddef]+
is wired. That should be(?:\udb40[\udd00-\uddef])+
.
Ah. My example code in comment:48 is incorrect. Sorry.
I agree to both, your statements and the ticket rating. This is not good at all.