Opened 11 years ago

# Valid acronyms with underlined wiki markup are not tagged as acronyms

Reported by: Owned by: Ben Allen normal AcronymsPlugin normal 0.12

When an acronym is underlined, AcronymsPlugin does not detect it or add the <acronym> tag to it. If the acronym contains other style elements "inside" the underline, then the acronym is tagged as expected.

For example, take the following wiki content: Acronym test: SCSI __SCSI__ '''SCSI''' ''SCSI'' '''''SCSI''''' __''SCSI''__ __'''SCSI'''__ ''__SCSI__'' '''__SCSI__''' '''__''SCSI''__'''

Acronym test:
<acronym title="Small Computer Simple Interface">SCSI</acronym>
<span class="underline">SCSI</span>
<strong><acronym title="Small Computer Simple Interface">SCSI</acronym></strong>
<em><acronym title="Small Computer Simple Interface">SCSI</acronym></em>
<strong><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></strong>
<span class="underline"><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></span>
<span class="underline"><strong><acronym title="Small Computer Simple Interface">SCSI</acronym></strong></span>
<em><span class="underline">SCSI</span></em>
<strong><span class="underline">SCSI</span></strong>
<strong><span class="underline"><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></span></strong>


which displays as:

Acronym test: SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI

The underlined text was not made into an acronym. The underline + italics and underline + bold cases were, but only if the underline markup was on the *outside* of the bold/italics markup. Curiously enough, the underline + bold + italics case works as long as the underline is not the innermost markup element.

My guess is that the parser is allowing underscores in an acronym and is interpreting the double-underscore as part of the acronym (thus it doesn't match anything in the acronym list so it doesn't get tagged). Stripping off leading and trailing non-alphanumeric characters before comparing the text to the acronym list should fix this problem, but I haven't tried to patch it myself so I can't say for sure.

### comment:1 Changed 11 years ago by Ryan J Ollos

Description: modified (diff) normal → high new → assigned

### comment:2 in reply to:  description Changed 11 years ago by Ryan J Ollos

My guess is that the parser is allowing underscores in an acronym and is interpreting the double-underscore as part of the acronym (thus it doesn't match anything in the acronym list so it doesn't get tagged). Stripping off leading and trailing non-alphanumeric characters before comparing the text to the acronym list should fix this problem, but I haven't tried to patch it myself so I can't say for sure.

Thank you for the detailed report. I'm surprised by this, given [9662]. I'll investigate now and see if a quick fix can be made.

### comment:3 Changed 11 years ago by Ryan J Ollos

(In [9740]) Strip trailing whitespace when parsing the AcronymDefinitions page. Previously, whitespace after the end of a row in the table would prevent that row from being parsed. Refs #8377.

### comment:4 follow-up:  6 Changed 11 years ago by Ryan J Ollos

This issue here appears to be similar to what I found in comment:1:ticket:8267.

We implement the IWikiSyntaxProvider method:

    # IWikiSyntaxProvider methods
def get_wiki_syntax(self):
if self.compiled_acronyms:
yield (self.compiled_acronyms, self._acronym_formatter)


__SCSI__ is not passed to the callback _acronym_formatter, so it is not being matched to compiled_acronyms. I'm not sure if this an internal issue with Trac, or if I can somehow modify compiled_acronyms, which for my test page looks like:

\b(?P<acronym>RFC2316|SCSI|ROM|URL|RFC)(?P<acronymselector>\w*)\b


### comment:5 Changed 11 years ago by Ryan J Ollos

Priority: high → normal

See also #857. I suspect all these tickets (#857, #8267, #8377) are related.

Since I'm stuck on this at the moment and have a number of other tickets that I know how to fix, I'm dropping the priority and will keep an eye out for a solution as I study the Trac source code.

If you want to do any additional investigation, I'll take quick action on any hints or a patch.

### comment:6 in reply to:  4 Changed 11 years ago by Ben Allen

__SCSI__ is not passed to the callback _acronym_formatter, so it is not being matched to compiled_acronyms. I'm not sure if this an internal issue with Trac, or if I can somehow modify compiled_acronyms, which for my test page looks like:

\b(?P<acronym>RFC2316|SCSI|ROM|URL|RFC)(?P<acronymselector>\w*)\b


If Trac isn't passing the text to the callback, then I don't think that it's an error in the plugin. You might be able to work around it, however.

I haven't delved too deeply into the Trac source regarding this, but I suspect that the Trac source uses a similar regular expression. Since the \w character class is equivalent to [A-Za-z0-9_], the regular expression will pick up underscores as a valid part of the word. You might be able to work around this Trac behavior by slightly modifying the _update_acronyms method. Whenever you add an acronym into the self.compiled_acronyms list, add the "underlined version" of the acronym as well. If I'm understanding the source correctly, this would mean changing line 34 from:

self.acronyms[a] = (escape(d), escape(u), escape(s))


to something similar to:

self.acronyms[a] = (escape(d), escape(u), escape(s))
a_2 = "__%s__" % a
self.acronyms[a_2] = (escape(d), escape(u), escape(s))


The drawback is that this would double the length of the self.compiled_acronyms list and would cause Trac to spend more time processing acronyms. You would probably also want something more intelligent for constructing a_2 that will first verify that the acronym doesn't already have leading or trailing underscores or that the list doesn't already contain an acronym with that name.

I haven't had a chance to test this myself, so it's merely a conjecture at this point.

### comment:7 Changed 11 years ago by Ryan J Ollos

Summary: Underlined acronyms aren't always converted → Valid acronyms with underlined wiki markup are not tagged as acronyms

### comment:8 Changed 8 years ago by Ryan J Ollos

Status: assigned → new

### comment:9 Changed 18 months ago by Ryan J Ollos

Owner: Ryan J Ollos deleted

### Modify Ticket

Change Properties