Modify

Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#8266 closed defect (fixed)

If acronym contains a hyphen, it is not linked to correct page

Reported by: Ryan J Ollos Owned by: Ryan J Ollos
Priority: normal Component: AcronymsPlugin
Severity: normal Keywords: unicode
Cc: Steffen Hoffmann, morpheus.me@… Trac Release: 0.11

Description

As described in comment:3#857, the following example:

||XXX    || XXX Page     || XXXPage    || ||
||YYY-XXX|| YYY-XXX Page || YYY-XXXPage|| ||

Results in:

Attachments (1)

HyphenExample.png (8.2 KB) - added by Ryan J Ollos 7 years ago.

Download all attachments as: .zip

Change History (9)

Changed 7 years ago by Ryan J Ollos

Attachment: HyphenExample.png added

comment:1 Changed 7 years ago by Ryan J Ollos

Owner: changed from Alec Thomas to Ryan J Ollos
Status: newassigned

comment:2 Changed 7 years ago by Ryan J Ollos

I've traced this to the regular expression not matching XXX-YYY. We'll need to modify the regular expression:

valid_acronym = re.compile('^\w+$')

comment:3 Changed 7 years ago by Ryan J Ollos

Cc: Steffen Hoffmann added; anonymous removed

I've added the UNICODE flag so that acronyms with unicode characters classified as alphanumeric will be matched. An alternative would be to set the LOCALE flag, in which case characters classified as alphanumeric in the environment's locale would be matched. I'm not sure which is better.

hasienda, is this something you'd like to test out, since you have done a lot of work with locales?

comment:4 Changed 7 years ago by Ryan J Ollos

(In [9585]) Refs #8266:

  • Some minor refactoring.
  • Set the UNICODE flag when compiling the regular expression used to match acronyms.

comment:5 in reply to:  3 Changed 7 years ago by Steffen Hoffmann

Keywords: unicode added

Replying to rjollos:

hasienda, is this something you'd like to test out, since you have done a lot of work with locales?

Will do and report back here; thank you for the hint.

comment:6 Changed 7 years ago by Ryan J Ollos

Cc: morpheus.me@… added

Received a hint about this in #5938, and will submit a fix shortly.

comment:7 Changed 7 years ago by Ryan J Ollos

Resolution: fixed
Status: assignedclosed

(In [9662]) Use \S in the regular expression that extracts acronym definitions from the /wiki/acronym page. \S will match any non-whitespace character, whereas \w only matches alphanumeric characters and the underscore. Fixes #8266.

comment:8 Changed 6 years ago by Steffen Hoffmann

I'm re-iterating through issues for this plugin now while preparing for an upcoming Trac application.

The regexp change sucessfully solved another issue for me: acronyms with Unicode characters like German umlauts. Before coming to this ticket I've done own experiments on this matter. Results have been rather confusing to me: re.U flag for that r'^\w+$' expression didn't result in expected matches:

Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40) 
[GCC 4.4.5] on linux2
>>> import re
>>> RE = re.compile(r'^\w+$', re.U)
>>> RE.match('ä')
>>> RE.match('ö')
>>> RE.match('ü')
<_sre.SRE_Match object at 0xb7359d08>
>>> RE.match('ß')
>>> RE.match('Ä')
>>> RE.match('Ö')
>>> RE.match('Ü')

The re.L flag didn't change matches at all. So the very general \S match is the best I can see right now. Still it troubles me, I may not understand that flags correctly...

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Ryan J Ollos.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.