Opened 14 years ago
Last modified 5 years ago
#8377 new defect
Valid acronyms with underlined wiki markup are not tagged as acronyms
Reported by: | Ben Allen | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | AcronymsPlugin |
Severity: | normal | Keywords: | |
Cc: | Trac Release: | 0.12 |
Description (last modified by )
When an acronym is underlined, AcronymsPlugin does not detect it or add the <acronym>
tag to it. If the acronym contains other style elements "inside" the underline, then the acronym is tagged as expected.
For example, take the following wiki content:
Acronym test: SCSI __SCSI__ '''SCSI''' ''SCSI'' '''''SCSI''''' __''SCSI''__ __'''SCSI'''__ ''__SCSI__'' '''__SCSI__''' '''__''SCSI''__'''
The following HTML is generated (newlines added for readability):
Acronym test: <acronym title="Small Computer Simple Interface">SCSI</acronym> <span class="underline">SCSI</span> <strong><acronym title="Small Computer Simple Interface">SCSI</acronym></strong> <em><acronym title="Small Computer Simple Interface">SCSI</acronym></em> <strong><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></strong> <span class="underline"><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></span> <span class="underline"><strong><acronym title="Small Computer Simple Interface">SCSI</acronym></strong></span> <em><span class="underline">SCSI</span></em> <strong><span class="underline">SCSI</span></strong> <strong><span class="underline"><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></span></strong>
which displays as:
Acronym test: SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSIThe underlined text was not made into an acronym. The underline + italics and underline + bold cases were, but only if the underline markup was on the *outside* of the bold/italics markup. Curiously enough, the underline + bold + italics case works as long as the underline is not the innermost markup element.
My guess is that the parser is allowing underscores in an acronym and is interpreting the double-underscore as part of the acronym (thus it doesn't match anything in the acronym list so it doesn't get tagged). Stripping off leading and trailing non-alphanumeric characters before comparing the text to the acronym list should fix this problem, but I haven't tried to patch it myself so I can't say for sure.
Attachments (0)
Change History (9)
comment:1 Changed 14 years ago by
Description: | modified (diff) |
---|---|
Priority: | normal → high |
Status: | new → assigned |
comment:2 Changed 14 years ago by
comment:3 Changed 14 years ago by
comment:4 follow-up: 6 Changed 14 years ago by
This issue here appears to be similar to what I found in comment:1:ticket:8267.
We implement the IWikiSyntaxProvider method:
# IWikiSyntaxProvider methods def get_wiki_syntax(self): if self.compiled_acronyms: yield (self.compiled_acronyms, self._acronym_formatter)
__SCSI__
is not passed to the callback _acronym_formatter
, so it is not being matched to compiled_acronyms
. I'm not sure if this an internal issue with Trac, or if I can somehow modify compiled_acronyms
, which for my test page looks like:
\b(?P<acronym>RFC2316|SCSI|ROM|URL|RFC)(?P<acronymselector>\w*)\b
comment:5 Changed 14 years ago by
Priority: | high → normal |
---|
See also #857. I suspect all these tickets (#857, #8267, #8377) are related.
Since I'm stuck on this at the moment and have a number of other tickets that I know how to fix, I'm dropping the priority and will keep an eye out for a solution as I study the Trac source code.
If you want to do any additional investigation, I'll take quick action on any hints or a patch.
comment:6 Changed 14 years ago by
Replying to rjollos:
__SCSI__
is not passed to the callback_acronym_formatter
, so it is not being matched tocompiled_acronyms
. I'm not sure if this an internal issue with Trac, or if I can somehow modifycompiled_acronyms
, which for my test page looks like:\b(?P<acronym>RFC2316|SCSI|ROM|URL|RFC)(?P<acronymselector>\w*)\b
If Trac isn't passing the text to the callback, then I don't think that it's an error in the plugin. You might be able to work around it, however.
I haven't delved too deeply into the Trac source regarding this, but I suspect that the Trac source uses a similar regular expression. Since the \w
character class is equivalent to [A-Za-z0-9_]
, the regular expression will pick up underscores as a valid part of the word. You might be able to work around this Trac behavior by slightly modifying the _update_acronyms
method. Whenever you add an acronym into the self.compiled_acronyms
list, add the "underlined version" of the acronym as well. If I'm understanding the source correctly, this would mean changing line 34 from:
self.acronyms[a] = (escape(d), escape(u), escape(s))
to something similar to:
self.acronyms[a] = (escape(d), escape(u), escape(s)) a_2 = "__%s__" % a self.acronyms[a_2] = (escape(d), escape(u), escape(s))
The drawback is that this would double the length of the self.compiled_acronyms
list and would cause Trac to spend more time processing acronyms. You would probably also want something more intelligent for constructing a_2
that will first verify that the acronym doesn't already have leading or trailing underscores or that the list doesn't already contain an acronym with that name.
I haven't had a chance to test this myself, so it's merely a conjecture at this point.
comment:7 Changed 14 years ago by
Summary: | Underlined acronyms aren't always converted → Valid acronyms with underlined wiki markup are not tagged as acronyms |
---|
comment:8 Changed 12 years ago by
Status: | assigned → new |
---|
comment:9 Changed 5 years ago by
Owner: | Ryan J Ollos deleted |
---|
Replying to AllenB:
Thank you for the detailed report. I'm surprised by this, given [9662]. I'll investigate now and see if a quick fix can be made.