Ticket #8266 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

If acronym contains a hyphen, it is not linked to correct page

Reported by: rjollos Assigned to: rjollos
Priority: normal Component: AcronymsPlugin
Severity: normal Keywords: unicode
Cc: hasienda, morpheus.me@gmail.com Trac Release: 0.11

Description

As described in comment:3#857, the following example:

||XXX    || XXX Page     || XXXPage    || ||
||YYY-XXX|| YYY-XXX Page || YYY-XXXPage|| ||

Results in:

Attachments

HyphenExample.png (8.2 kB) - added by rjollos on 12/04/10 11:44:57.

Change History

12/04/10 11:44:57 changed by rjollos

  • attachment HyphenExample.png added.

12/04/10 12:11:05 changed by rjollos

  • status changed from new to assigned.
  • owner changed from athomas to rjollos.

12/05/10 15:35:25 changed by rjollos

I've traced this to the regular expression not matching XXX-YYY. We'll need to modify the regular expression:

valid_acronym = re.compile('^\w+$')

(follow-up: ↓ 5 ) 12/05/10 15:49:22 changed by rjollos

  • cc set to hasienda.

I've added the UNICODE flag so that acronyms with unicode characters classified as alphanumeric will be matched. An alternative would be to set the LOCALE flag, in which case characters classified as alphanumeric in the environment's locale would be matched. I'm not sure which is better.

hasienda, is this something you'd like to test out, since you have done a lot of work with locales?

12/05/10 15:52:03 changed by rjollos

(In [9585]) Refs #8266:

  • Some minor refactoring.
  • Set the UNICODE flag when compiling the regular expression used to match acronyms.

(in reply to: ↑ 3 ) 12/05/10 21:28:32 changed by hasienda

  • keywords set to unicode.

Replying to rjollos:

hasienda, is this something you'd like to test out, since you have done a lot of work with locales?

Will do and report back here; thank you for the hint.

12/12/10 07:07:12 changed by rjollos

  • cc changed from hasienda to hasienda, morpheus.me@gmail.com.

Received a hint about this in #5938, and will submit a fix shortly.

12/12/10 07:07:45 changed by rjollos

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [9662]) Use \S in the regular expression that extracts acronym definitions from the /wiki/acronym page. \S will match any non-whitespace character, whereas \w only matches alphanumeric characters and the underscore. Fixes #8266.

07/01/11 22:52:37 changed by hasienda

I'm re-iterating through issues for this plugin now while preparing for an upcoming Trac application.

The regexp change sucessfully solved another issue for me: acronyms with Unicode characters like German umlauts. Before coming to this ticket I've done own experiments on this matter. Results have been rather confusing to me: re.U flag for that r'^\w+$' expression didn't result in expected matches:

Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40) 
[GCC 4.4.5] on linux2
>>> import re
>>> RE = re.compile(r'^\w+$', re.U)
>>> RE.match('ä')
>>> RE.match('ö')
>>> RE.match('ü')
<_sre.SRE_Match object at 0xb7359d08>
>>> RE.match('ß')
>>> RE.match('Ä')
>>> RE.match('Ö')
>>> RE.match('Ü')

The re.L flag didn't change matches at all. So the very general \S match is the best I can see right now. Still it troubles me, I may not understand that flags correctly...


Add/Change #8266 (If acronym contains a hyphen, it is not linked to correct page)




Change Properties
Action