Modify

Opened 11 years ago

Closed 8 years ago

Last modified 4 years ago

#11622 closed defect (fixed)

users can post or comment on trac-hacks.org in the name of any registered user

Reported by: falkb Owned by: Ryan J Ollos
Priority: high Component: AccountManagerPlugin
Severity: critical Keywords:
Cc: Ryan J Ollos, Steffen Hoffmann, lkraav Trac Release:

Description

I noticed I can comment on tickets here on trac-hacks.org as falkb without being asked for my password.

I have no problem with anonymous users posting or commenting with fictive names or email addresses. I think this is even a useful feature for people who are don't forced to register when they want to report something.

But I think if one tries to post or comment with a registered account name or registered email address, such action has to be acknowledged by a password barrier. This way one may still post with fictive names (fine with me), but is not allowed to post in the name of a known user without confirmation.

I think this is important to prevent account abuse.

Attachments (5)

20140316_prevent-ambiguous-sid.patch (1.8 KB) - added by Steffen Hoffmann 11 years ago.
proposed changes for preventing ambiguous anonymous session IDs using AccountManagerPlugin
20140317_prevent-ambiguous-sid.patch (692 bytes) - added by Steffen Hoffmann 11 years ago.
2nd version including improved warning message according to falkb's recommendation
20140421_prevent-ambiguous-sid.patch (6.0 KB) - added by Steffen Hoffmann 11 years ago.
3rd version including resource manipulator interface implementations as well as hard request deflection for other cases
20140616_prevent-ambiguous-sid.patch (7.7 KB) - added by Ryan J Ollos 11 years ago.
accounthack-smoketest.PNG (36.1 KB) - added by falkb 8 years ago.

Download all attachments as: .zip

Change History (61)

comment:1 Changed 11 years ago by Steffen Hoffmann

I agree to both, your statements and the ticket rating. This is not good at all.

comment:2 Changed 11 years ago by falkb

Let's see if this works...

comment:3 in reply to:  2 Changed 11 years ago by Michael Renzmann

Replying to falkb:

Let's see if this works...

That was me. It obviously works. However, since the same works on teo, too I think this is a generic Trac issue rather than one specific to trac-hacks.org...

comment:4 Changed 11 years ago by Ryan J Ollos

The issue has been known on t.e.o for a long time. I saw one instance of someone pretending to be ecarter in a ticket a long time ago. There have also been some ticket comments and page edits using usernames that are active on t-h.o, such as myself or falkb. The edits were obviously spam in those cases.

I think we can eventually implement a username check, either in AccountManagerPlugin or TracHacksPlugin.

Last edited 11 years ago by Ryan J Ollos (previous) (diff)

comment:5 Changed 11 years ago by Steffen Hoffmann

Hm, I'm worried to see t:#1890, that exactly reflects the current state: unchanged source regarding this issue for 8 years by now and not even a comment within the last 3 years.

comment:6 in reply to:  4 Changed 11 years ago by Steffen Hoffmann

Replying to rjollos:

The issue has been known on t.e.o for a long time. ...

I do remember some discussion on the issue myself, yes.

I think we can eventually implement a username check, either in AccountManagerPlugin or TracHacksPlugin.

Good point. In the same notion as we prevent changes to invalid email addresses with email validation enabled we could enforce some sort of username clearance too. I'll start to work on this right after the TagsPlugin release, unless someone beats me with an earlier patch attached to an enhancement request for AccountManagerPlugin.

comment:7 Changed 11 years ago by Ryan J Ollos

After reading through trac:#1890, I'm left feeling that all of the proposed solutions will require quite a bit of effort, although a good solution seems to be outlined in trac:comment:53:ticket:1890 and trac:comment:94:ticket:1890.

A reasonably quick solution for trac-hacks.org would be to implement ITicketManipulator and IWikiManipulator: if req.authname == 'anonymous', then check if author matches a sid in the session table with authenticated = 1. If a match is found, reject the change and tell the user they must login in order to save a change with that username.

A minority of users may not like the requirement to login, but I think it's reasonable to require a login if you've previously registered an account.

comment:8 in reply to:  7 ; Changed 11 years ago by Steffen Hoffmann

Replying to rjollos:

A reasonably quick solution for trac-hacks.org would be to implement ITicketManipulator and IWikiManipulator: if req.authname == 'anonymous', then check if author matches a sid in the session table with authenticated = 1. If a match is found, reject the change and tell the user they must login in order to save a change with that username.

I'd go one step further and disallow to even change an anonymous session id to a registered username. While this would be easier done in Trac core, we can do so inside AcctMgr too.

A minority of users may not like the requirement to login, but I think it's reasonable to require a login if you've previously registered an account.

Absolutely reasonable for the sake of clarity, yes.

comment:9 in reply to:  8 ; Changed 11 years ago by Ryan J Ollos

Replying to hasienda:

I'd go one step further and disallow to even change an anonymous session id to a registered username.

So we just check if the author matches a sid in the session table and don't require authenticated = 1? I think that would work since the sids are hashes in the case of an unauthenticated session.

While this would be easier done in Trac core

If there's an easier fix to the Trac core, I think the issue would be worth pushing there. However we fix this on t-h.o, they'll probably want to fix it on t.e.o as well.

comment:10 in reply to:  9 Changed 11 years ago by Steffen Hoffmann

Replying to rjollos:

Replying to hasienda:

I'd go one step further and disallow to even change an anonymous session id to a registered username.

So we just check if the author matches a sid in the session table and don't require authenticated = 1? I think that would work since the sids are hashes in the case of an unauthenticated session.

Yes, that's the plan, and ideally we'll do so in Trac core code for setting non-random, custom session id's, but I think that at least meanwhile we can intercept on a higher level, or at a different end, i.e. by using a specially crafted authenticator to instantly stop such ambiguous sessions from proceeding.

Changed 11 years ago by Steffen Hoffmann

proposed changes for preventing ambiguous anonymous session IDs using AccountManagerPlugin

comment:11 Changed 11 years ago by Steffen Hoffmann

In 13756:

AccountManagerPlugin: Prevent skipping creation of a new authenticated session ID, refs #11622.

Research done for resolving another issue suddenly revealed this issue.

comment:12 Changed 11 years ago by Steffen Hoffmann

Proposed changes do not alter existing IDs but prevent everyone from choosing an existing authenticated session ID now and forever.

Hint: Any anonymous session ID that is identical to an authenticated session ID is deleted automatically next time an anonymous session is promoted to authenticated session ID of the same name (see Session.promote_session() in t:source:trunk/trac/web/session.py for details).

comment:13 Changed 11 years ago by falkb

Instead of "Choose another session ID, please." I'd rather say "Choose another session ID or login, please."

comment:14 Changed 11 years ago by falkb

even better "Login or choose another session ID, please."

Changed 11 years ago by Steffen Hoffmann

2nd version including improved warning message according to falkb's recommendation

comment:15 Changed 11 years ago by Steffen Hoffmann

More suggestions welcome. Or should it have been that easy? ;-)

Last edited 11 years ago by Steffen Hoffmann (previous) (diff)

comment:16 Changed 11 years ago by falkb

fine with me if it simply prevents to post in the name of registered others. I wonder why there was so much discussion in t:#1890 over so many years.

Last edited 11 years ago by Steffen Hoffmann (previous) (diff)

comment:17 Changed 11 years ago by lkraav

Cc: lkraav added

comment:18 in reply to:  15 ; Changed 11 years ago by paul@…

Replying to hasienda:

More suggestions welcome. Or should it have been that easy? ;-)

just do activate the patch! We'll see whether it helps or not

comment:19 in reply to:  18 Changed 11 years ago by Steffen Hoffmann

Replying to paul@…:

Replying to hasienda:

More suggestions welcome. Or should it have been that easy? ;-)

just do activate the patch! We'll see whether it helps or not

One step at a time. We don't want to introduce bugs by careless action. Even more since discussion on the corresponding ticket for Trac core went for years by now, so I would rather hear a bit more feedback before applying the patch to AccountManagerPlugin's trunk.

Afterwards we could roll it to this Trac instance, if a) I release another minor version of current stable branch of the plugin, effectively forwarding the change to every updated plugin installation, or b) someone patches the plugin source exclusively for the installation here.

Even with broad approval there is still no solution for Trac core. Many Trac application including t.e.o do not utilize AccountManagerPlugin, that we plan to alter here. And the currently suggested approach itself is generally flawed, because it relies on checking the Trac db table session. Partially repeating comments from the other ticket here for the list of shortcomings: There is not yet a way of

  • detecting authorized users, that did not log-in to Trac before
  • preventing irritation by username-alike session IDs (without knowing the full user list)
  • preventing choice of session IDs of accounts created in the future (likely a 'cantfix', but worth to look into for another pre-registration sanity check for AccountManager's registration procedure)

comment:20 in reply to:  12 ; Changed 11 years ago by Ryan J Ollos

Replying to hasienda:

Proposed changes do not alter existing IDs but prevent everyone from choosing an existing authenticated session ID now and forever.

I tested out the patch. It does prevent a user from populating the name attribute (SESSION_ATTRIBUTE.name) of an anonymous session with the sid of an authenticated user.

However, it doesn't prevent users from entering the sid of an authenticated user in the reporter field when creating tickets, in the author field when commenting on tickets and in the author field when editing the wiki. It seems that the latter is the more serious issue that needs to be addressed.

Regarding the code, this line confused me a bit:

not (req.authname and req.authname != 'anonymous')

The intention is more clear to me with:

(not req.authname or req.authname == 'anonymous')
Last edited 11 years ago by Ryan J Ollos (previous) (diff)

comment:21 in reply to:  20 ; Changed 11 years ago by Steffen Hoffmann

Replying to rjollos:

However, it doesn't prevent users from entering the sid of an authenticated user in the reporter field when creating tickets, in the author field when commenting on tickets and in the author field when editing the wiki. It seems that the latter is the more serious issue that needs to be addressed.

Sorry, I forgot about these places, because I generally do not allow such anonymous contributions in Trac applications I manage and always care to log-in myself before acting anywhere else too. This needs to get addressed by using the manipulator interfaces indeed.

Regarding the code, this line confused me a bit:

not (req.authname and req.authname != 'anonymous')

The intention is more clear to me with:

(not req.authname or req.authname == 'anonymous')

While the expressions are logically equivalent NOT(a AND b) == NOT(a) OR NOT(b), there is a subtle difference, that let python programmers prefer the first term: First and-ed statement will skip further evaluation of req.authname, if the value is empty or None), the 2nd or-ed statement needs to always check both parts.

comment:22 in reply to:  21 ; Changed 11 years ago by Ryan J Ollos

Replying to hasienda:

there is a subtle difference, that let python programmers prefer the first term: First and-ed statement will skip further evaluation of req.authname, if the value is empty or None), the 2nd or-ed statement needs to always check both parts.

More specifically I would say, a python programmer would saw the need for the optimization would prefer the first term. It is no matter though, as it may only be me that tends to get confused by the logic, and anything confusing can always be clarified with a comment.

comment:23 in reply to:  22 ; Changed 11 years ago by Steffen Hoffmann

Replying to rjollos:

More specifically I would say, a python programmer would saw the need for the optimization would prefer the first term. It is no matter though, as it may only be me that tends to get confused by the logic, and anything confusing can always be clarified with a comment.

I've seen the same pattern in Trac code several times and got confused about the logic as you did, even replaced it. From that experience I tell you, that it is not always about optimization. Instead of req.authname think of a method, that could return a list. We always want to look at the first list element, but in some cases the list might as well be empty. Here the first statement really shines, as it prevents an IndexError: list index out of range for the empty list, so it is in general a safer expression - nice defensive design example IMHO.

comment:24 Changed 11 years ago by lkraav

Not to turn this into a programming theory discussion, but should Yoda conditions be also generally used in Python? Not sure if so much applicable to the AND version, but definitely for the OR version.

not (req.authname and 'anonymous' != req.authname)
(not req.authname or 'anonymous' == req.authname)

comment:25 in reply to:  23 Changed 11 years ago by anonymous

Replying to hasienda:

Regarding the code, this line confused me a bit:

not (req.authname and req.authname != 'anonymous')

The intention is more clear to me with:

(not req.authname or req.authname == 'anonymous')

While the expressions are logically equivalent NOT(a AND b) == NOT(a) OR NOT(b), there is a subtle difference, that let python programmers prefer the first term: First and-ed statement will skip further evaluation of req.authname, if the value is empty or None), the 2nd or-ed statement needs to always check both parts.

The 2nd or-ed statement also skips the req.authname == 'anonymous' part if not req.authname. The or operator gets short-circuited just like and.

Replying to hasienda:

Instead of req.authname think of a method, that could return a list. We always want to look at the first list element, but in some cases the list might as well be empty. Here the first statement really shines, as it prevents an IndexError: list index out of range for the empty list, so it is in general a safer expression - nice defensive design example IMHO.

>>> list = []
>>> not(list and list[0] != 'anonymous')
True
>>> (not list or list[0] == 'anonymous')
True

comment:26 in reply to:  20 Changed 11 years ago by Steffen Hoffmann

Replying to rjollos:

However, it doesn't prevent users from entering the sid of an authenticated user in the reporter field when creating tickets, in the author field when commenting on tickets and in the author field when editing the wiki. It seems that the latter is the more serious issue that needs to be addressed.

Updated patch addresses all but the initial reporter field. I've just overlooked that, sorry.

Changed 11 years ago by Steffen Hoffmann

3rd version including resource manipulator interface implementations as well as hard request deflection for other cases

comment:27 Changed 11 years ago by Steffen Hoffmann

Right, the bogus anonymous reporter was handled as a generic case. Not as graceful as I wanted, but still a proof for other author input not handled by manipulators yet.

Just updated the patch in-place, because the additional change was only to include 'newticket' as another exception from the generic case. So it shall be complete now. More comments?

comment:28 Changed 11 years ago by Ryan J Ollos

I'll test and get you feedback within the next day or so.

Are you considering creating a 0.4.5 release that includes this change? That would allow us to deploy this soon and not depend on a patch instance of AccountManagerPlugin.

comment:29 in reply to:  28 Changed 11 years ago by Steffen Hoffmann

Replying to rjollos:

I'll test and get you feedback within the next day or so.

Would be great, thanks.

Are you considering creating a 0.4.5 release that includes this change? That would allow us to deploy this soon and not depend on a patch instance of AccountManagerPlugin.

I didn't think about it, but given the potential of confusion caused by current behavior I would be willing to push it, this way or another (finalize acct_mgr-0.5).

comment:30 Changed 11 years ago by Steffen Hoffmann

In 13914:

DiscussionPlugin: Enable fine-grained resource permission checks, refs #756 and #11622.

Changes base on a patch proposal by Simon Stelling - thank you very much.

Notes on alterations:

  • constructing resources from realm resource for clarity and code reduction
  • resource ID parsing in different places justifies dedicated private method
  • 'DISCUSSION_ADMIN' inherits 'DISCUSSION_MODERATE' permission action, but permission checks need to check minimal requirement only
  • more PEP8 changes (excessive white-space inside brackets, line-wrap)
  • gradually adopt 'Yoda conditions' recently suggested as good coding style by lkraav

comment:31 in reply to:  24 Changed 11 years ago by Ryan J Ollos

Replying to lkraav:

Not to turn this into a programming theory discussion, but should Yoda conditions be also generally used in Python?

Steffen captured this on the DevGuide#CodingStyle page. It could be discussed on the TracDev mailing list to see if this is something we'd want to adopt in t:TracDev/CodingStyle.

From the cases I've considered thus far, there is stronger justification for yoda conditions in C. In Python, the following is a syntax error:

if a = 3:
    print 'yes'

However, the following case would justify their use:

# suppose you intend this
b = a == some_constant
# but instead you type this, which is not a syntax error
b = a = some_constant
# on the other hand, the "yoda condition" form would be a syntax error
b = some_constant = a
# however, using parenthesis would result in a syntax error as well
b = (a = some_constant)
Last edited 11 years ago by Ryan J Ollos (previous) (diff)

comment:32 Changed 11 years ago by Jun Omae

About the patches, I think it is easy to escape the checking of author's name using Unicode zero-width space (U+200B), Right-to-left mark (U+200F), etc.

comment:33 in reply to:  32 Changed 11 years ago by Steffen Hoffmann

Replying to jun66j5:

About the patches, I think it is easy to escape the checking of author's name using Unicode

zero-width space (U+200B), Right-to-left mark (U+200F), etc.

Thanks for picking up on this, but you intention is not completely clear to me yet. Do you mean to encourage me to improve towards including Unicode white space, or do you feel that the whole approach is flawed somehow? I vaguely remember checking for such flag to no avail when developing for this feature. Any pointer on where to head from here would be appreciated.

comment:34 in reply to:  32 Changed 11 years ago by Ryan J Ollos

Replying to jun66j5:

About the patches, I think it is easy to escape the checking of author's name using Unicode zero-width space (U+200B), Right-to-left mark (U+200F), etc.

That is a good point. I can reproduce by pasting a few ZWSP characters into the author field before typing the name of an authenticated author. We could use stripws from the trac core, which is available since 1.0: trac:browser:/tags/trac-1.0/trac/util/text.py@:101#L98. For AccountManagerPlugin we'll need to copy the function to the compat module.

Btw, I had been wondering for some time why ZWSP had to be explicitly included in the regex along with \s. A possible explanation is: SO:8928710/121694.

Everything else with the patch looks good to me. Revised patch: 20140616_prevent-ambiguous-sid.patch.

It would be great to apply this to the AccountManagerPlugin and roll out a new version that we can use on t-h.o. Thanks for your efforts to implement this, and I apologize for the delay in my reply.

Changed 11 years ago by Ryan J Ollos

comment:35 Changed 11 years ago by Jun Omae

I think we should remove the zero-width characters from entire of the author rahter than strip from trailing and leading of the author.

Code point Bytes Example Notes
U+180E "jun" U+180E "66j5" jun᠎66j5 Maybe requires Mongolian Font
U+200B "jun" U+200B "66j5" jun​66j5
U+200C "jun" U+200C "66j5" jun‌66j5
U+200D "jun" U+200D "66j5" jun‍66j5
U+2060 "jun" U+2060 "66j5" jun⁠66j5
U+FEFF "jun" U+FEFF "66j5" jun66j5

See also https://www.cs.tut.fi/~jkorpela/chars/spaces.html and http://en.wikipedia.org/wiki/Space_%28punctuation%29#Spaces_in_Unicode.

Also, Directional Formatting Characters can be used to bypass the checking.

Code point Bytes Example
U+200E "jun" U+200E "66j5" jun‎66j5
U+200F "jun" U+200F "66j5" jun‏66j5

http://www.unicode.org/reports/tr9/#Directional_Formatting_Characters.

Last edited 11 years ago by Jun Omae (previous) (diff)

comment:36 Changed 11 years ago by Ryan J Ollos

Removing all zero-width characters from the string seems fairly straightforward. Do you think you've documented all of the zero-width characters, or do we need to do some additional searching? The list of all possible unicode characters is quite overwhelming.

comment:37 Changed 11 years ago by Jun Omae

I cannot find the documentation about ZWSPs. Instead, found FAQ in unicode.org, Q: Which characters should be displayed as invisible, if not supported?.

According to the answer, the following list is all of ZWSP characters.

Category Code point Example
cursive joiners U+200C a‌b
cursive joiners U+200D a‍b
bidirectional format controls U+061C a؜b
bidirectional format controls U+200E a‎b
bidirectional format controls U+200F a‏b
bidirectional format controls U+202A a‪b
bidirectional format controls U+202B a‫b
bidirectional format controls U+202C a‬b
bidirectional format controls U+202D a‭b
bidirectional format controls U+202E a‮b
bidirectional format controls U+2066 a⁥b
bidirectional format controls U+2067 a⁧b
bidirectional format controls U+2068 a⁨b
bidirectional format controls U+2069 a⁩b
the soft hyphen U+00AD a­b
word joiners U+2060 a⁠b
word joiners U+FEFF ab
the zero width space U+200B a​b
invisible math operators U+2061 a⁡b
invisible math operators U+2062 a⁢b
invisible math operators U+2063 a⁣b
invisible math operators U+2064 a⁤b
Jamo filler characters U+115F aᅟb
Jamo filler characters U+1160 aᅠb
variation selectors U+180B a᠋b
variation selectors U+180C a᠌b
variation selectors U+180D a᠍b
variation selectors U+FE00 – U+FE0F a︀b a️b
variation selectors U+E0100 – U+E01EF a󠄀b a󠇯b

comment:38 in reply to:  37 Changed 10 years ago by Ryan J Ollos

Replying to jun66j5:

I cannot find the documentation about ZWSPs. Instead, found FAQ in unicode.org, Q: Which characters should be displayed as invisible, if not supported?.

According to the answer, the following list is all of ZWSP characters.

Thanks, I'll make a modified version of stripws, and after the patch is complete we can push the change back to the Trac core if the modified function is generally useful. If anyone else wants to take the lead on revising the patch, please fell free.

comment:39 Changed 10 years ago by Ryan J Ollos

Component: TracHacksAccountManagerPlugin
Owner: changed from Michael Renzmann to Steffen Hoffmann

Moving this to AccountManagerPlugin since the patch will be applied there.

comment:40 Changed 10 years ago by falkb

I've posted this comment as falkb again without being asked for my password. :-(

comment:41 Changed 10 years ago by Ryan J Ollos

Of course you did. Do you see resolution fixed on this ticket?

comment:42 Changed 10 years ago by falkb

Sorry, my comment was not meant rude. I just see something has happened here in the meanwhile (which could be partial commits on the way to the 'fixed' state) but currently I don't have time to read it all. So it was just a very fast contribution of my user test, with the idea of better testing too much than to less. Friendly greetings to you all :-)

comment:43 Changed 10 years ago by Ryan J Ollos

I see. Well I apologize and should not have assumed you were just nagging us ;)

The patch needs to be revised per comment:32. It's not currently my highest priority, but I would like to get it fixed for AccountManagerPlugin release 0.5.

comment:44 in reply to:  43 Changed 10 years ago by Steffen Hoffmann

Replying to rjollos:

...
I would like to get it fixed for AccountManagerPlugin release 0.5.

Great. I did not look into Jun's contribution so far as well, but this should really go into next release, given the rather big impact on (anonymous) request handling.

comment:45 Changed 10 years ago by Ryan J Ollos

Owner: changed from Steffen Hoffmann to Ryan J Ollos
Status: newaccepted

comment:46 Changed 8 years ago by Ryan J Ollos

Resolution: fixed
Status: acceptedclosed

In 16662:

TracAccountManager 0.5dev: Prevent anonymous commenting as a registered user

Initial patch by hasienda, with contributions from jun66j5.

Fixes #11622.

comment:47 Changed 8 years ago by Ryan J Ollos

This ticket was long overdue.

I tried including the variation selectors in the regex, but the tests fail. Do the code points outside plane 0 need to be specified in a different way? Anyway, I don't think that minor point is too important.

  • acct_mgr/tests/util.py

     
    4747    def test_remove_zwsp(self):
    4848        self.assertEqual(u'user', remove_zwsp(u'\u200buser\u200b'))
    4949        self.assertEqual(u'user', remove_zwsp(u'\u200fu\ufe00ser\u061c'))
     50        self.assertEqual(u'user', remove_zwsp(u'fu\ue0100ser'))
    5051
    5152
    5253def test_suite():
  • acct_mgr/util.py

     
    129129
    130130_zwsp_re = re.compile(u'[\\s\u200b-\u200f\u061c\u202a-\u202e'
    131131                      u'\u2066-\u2069\u00ad\u2060\ufeff\u2061-\u2064'
    132                       u'\u115f\u1160\u180b-\u180d\ufe00-\ufe0f]+',
    133                       re.UNICODE)
     132                      u'\u115f\u1160\u180b-\u180d\ufe00-\ufe0f'
     133                      u'\ue0100-\ue01ef]+', re.UNICODE)
    134134
    135135
    136136def remove_zwsp(text):
$ python -m unittest acct_mgr.tests.util
.F
======================================================================
FAIL: test_remove_zwsp (acct_mgr.tests.util.UtilTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "acct_mgr/tests/util.py", line 48, in test_remove_zwsp
    self.assertEqual(u'user', remove_zwsp(u'\u200buser\u200b'))
AssertionError: u'user' != u''
- user
+


----------------------------------------------------------------------
Ran 2 tests in 0.001s

FAILED (failures=1)

I'll deploy changes to trac-hacks.org within a few days and make a blog post.

If anyone complains about the new behavior running TracAccountManager in their Trac instance, we can move the manipulators and request filter into a component that can be disabled.

comment:48 Changed 8 years ago by Jun Omae

The u'fu\ue0100ser' should be u'fu\U000e0100ser'.

Linux:

Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> [hex(ord(c)) for c in u'fu\ue0100ser']
['0x66', '0x75', '0xe010', '0x30', '0x73', '0x65', '0x72']
>>> [hex(ord(c)) for c in u'fu\U000e0100ser']
['0x66', '0x75', '0xe0100', '0x73', '0x65', '0x72']

Windows:

Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> [hex(ord(c)) for c in u'fu\U000e0100ser']
['0x66', '0x75', '0xdb40', '0xdd00', '0x73', '0x65', '0x72']

If Python is narrow build, U+10000 and later characters cannot be used in regular expression. We should use surrogate pair.

>>> import re
>>> re.compile(u'[\U000e0100-\U000e01ef]+', re.UNICODE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\usr\apps\python27-amd64\lib\re.py", line 194, in compile
    return _compile(pattern, flags)
  File "C:\usr\apps\python27-amd64\lib\re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range
>>> re.compile(u'\udb40[\udd00-\uddef]+', re.UNICODE)
<_sre.SRE_Pattern object at 0x0000000002806DD8>
Last edited 8 years ago by Jun Omae (previous) (diff)

comment:49 Changed 8 years ago by Ryan J Ollos

Thanks, I'll commit that change.

comment:50 Changed 8 years ago by Ryan J Ollos

In 16663:

TracAccountManager 0.5dev: Remove variation selectors

Patch by Jun Omae.

Refs #11622.

comment:51 Changed 8 years ago by Ryan J Ollos

Changes deployed to trac-hacks.org. Please test and let me know if there are any issues. Changes described in blog:2017-06-15-username-policy.

Some other ideas to consider:

  • Don't let user comment with a registered email address in the author field
  • Decorate changes made by unauthenticated users with a subtle indicator

Changed 8 years ago by falkb

Attachment: accounthack-smoketest.PNG added

comment:52 Changed 8 years ago by falkb

Works great! Thanks a lot. Smoke-test succeeded.

comment:53 Changed 8 years ago by Jun Omae

In [16663], u'[\udb40[\udd00-\uddef]+ is wired. That should be (?:\udb40[\udd00-\uddef])+.

It is not needed to use surrogate pairs in unicode literal. Python interpreter automatically convert from UCS-4 character to surrogate pair with narrow build.

Linux (wide build):

Python 2.5.6 (r256:88840, Oct 21 2014, 22:49:55)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> hex(sys.maxunicode)
'0x10ffff'
>>> parse_arg_list('text=%F3%A0%84%80-%F3%A0%87%AF')
[(u'text', u'\U000e0100-\U000e01ef')]
>>> [hex(ord(c)) for c in parse_arg_list('text=%F3%A0%84%80-%F3%A0%87%AF')[0][1]]
['0xe0100', '0x2d', '0xe01ef']
>>> import re
>>> re.compile(u'[\U000e0100-\U000e01ef]+', re.U)
<_sre.SRE_Pattern object at 0x20cbf30>

Windows (narrow build):

Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> hex(sys.maxunicode)
'0xffff'
>>> from trac.web.api import parse_arg_list
>>> parse_arg_list('text=%F3%A0%84%80-%F3%A0%87%AF')
[(u'text', u'\U000e0100-\U000e01ef')]
>>> [hex(ord(c)) for c in parse_arg_list('text=%F3%A0%84%80-%F3%A0%87%AF')[0][1]]
['0xdb40', '0xdd00', '0x2d', '0xdb40', '0xddef']
>>> import re
>>> re.compile(u'[\U000e0100-\U000e01ef]+', re.U)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\usr\src\trac\venv\trac-1.0.12\lib\re.py", line 194, in compile
    return _compile(pattern, flags)
  File "C:\usr\src\trac\venv\trac-1.0.12\lib\re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range
  • accountmanagerplugin/trunk/acct_mgr/tests/util.py

    diff --git a/accountmanagerplugin/trunk/acct_mgr/tests/util.py b/accountmanagerplugin/trunk/acct_mgr/tests/util.py
    index ce89355c5..5d0ab0f69 100644
    a b class UtilTestCase(unittest.TestCase): 
    4848        self.assertEqual(u'user', remove_zwsp(u'user'))
    4949        self.assertEqual(u'user', remove_zwsp(u'\u200buser\u200b'))
    5050        self.assertEqual(u'user', remove_zwsp(u'\u200fu\ufe00ser\u061c'))
    51         self.assertEqual(u'user', remove_zwsp(u'u\udb40\udd00ser'))
     51        self.assertEqual(u'u\U000e00ffser', remove_zwsp(u'u\U000e00ffser'))
     52        self.assertEqual(u'user', remove_zwsp(u'u\U000e0100ser'))
     53        self.assertEqual(u'user', remove_zwsp(u'u\U000e01efser'))
     54        self.assertEqual(u'u\U000e01f0ser', remove_zwsp(u'u\U000e01f0ser'))
    5255
    5356
    5457def test_suite():
  • accountmanagerplugin/trunk/acct_mgr/util.py

    diff --git a/accountmanagerplugin/trunk/acct_mgr/util.py b/accountmanagerplugin/trunk/acct_mgr/util.py
    index f57a84e1e..3e0a702a4 100644
    a b def pretty_precise_timedelta(time1, time2=None, resolution=None, diff=0): 
    127127            % (str(t) != '0' and t or '')).rstrip()
    128128
    129129
    130 _zwsp_re = re.compile(u'([\\s\u200b-\u200f\u061c\u202a-\u202e'
    131                       u'\u2066-\u2069\u00ad\u2060\ufeff\u2061-\u2064'
    132                       u'\u115f\u1160\u180b-\u180d\ufe00-\ufe0f]+|'
    133                       u'[\udb40[\udd00-\uddef]+)',
    134                       re.UNICODE)
     130def _create_zwsp_re():
     131    ucs2_range = u'\\s\u200b-\u200f\u061c\u202a-\u202e\u2066-\u2069\u00ad' \
     132                 u'\u2060\ufeff\u2061-\u2064\u115f\u1160\u180b-\u180d' \
     133                 u'\ufe00-\ufe0f'
     134    try:
     135        pattern = re.compile(u'[%s\U000e0100-\U000e01ef]+' % ucs2_range,
     136                             re.UNICODE)
     137    except re.error:
     138        # Narrow build, `re` cannot use characters >= 0x10000
     139        pattern = re.compile(u'[%s]+|(?:\udb40[\udd00-\uddef])+' % ucs2_range,
     140                             re.UNICODE)
     141    return pattern
     142
     143
     144_zwsp_re = _create_zwsp_re()
    135145
    136146
    137147def remove_zwsp(text):
Last edited 8 years ago by Jun Omae (previous) (diff)

comment:54 in reply to:  53 Changed 8 years ago by Jun Omae

Replying to Jun Omae:

In [16663], u'[\udb40[\udd00-\uddef]+ is wired. That should be (?:\udb40[\udd00-\uddef])+.

Ah. My example code in comment:48 is incorrect. Sorry.

comment:55 Changed 8 years ago by Ryan J Ollos

Complicated! Thanks for the patch.

comment:56 Changed 8 years ago by Ryan J Ollos

In 16666:

TracAccountManager 0.5dev: Fix regex for narrow and wide builds

Patch by Jun Omae.

Refs #11622.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Ryan J Ollos.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.