Modify

Opened 4 years ago

Last modified 15 months ago

#8377 new defect

Valid acronyms with underlined wiki markup are not tagged as acronyms

Reported by: AllenB Owned by: rjollos
Priority: normal Component: AcronymsPlugin
Severity: normal Keywords:
Cc: Trac Release: 0.12

Description (last modified by rjollos)

When an acronym is underlined, AcronymsPlugin does not detect it or add the <acronym> tag to it. If the acronym contains other style elements "inside" the underline, then the acronym is tagged as expected.

For example, take the following wiki content:
Acronym test: SCSI __SCSI__ '''SCSI''' ''SCSI'' '''''SCSI''''' __''SCSI''__ __'''SCSI'''__ ''__SCSI__'' '''__SCSI__''' '''__''SCSI''__'''

The following HTML is generated (newlines added for readability):

Acronym test:
<acronym title="Small Computer Simple Interface">SCSI</acronym>
<span class="underline">SCSI</span>
<strong><acronym title="Small Computer Simple Interface">SCSI</acronym></strong>
<em><acronym title="Small Computer Simple Interface">SCSI</acronym></em>
<strong><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></strong>
<span class="underline"><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></span>
<span class="underline"><strong><acronym title="Small Computer Simple Interface">SCSI</acronym></strong></span>
<em><span class="underline">SCSI</span></em>
<strong><span class="underline">SCSI</span></strong>
<strong><span class="underline"><em><acronym title="Small Computer Simple Interface">SCSI</acronym></em></span></strong>

which displays as:

Acronym test: SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI SCSI

The underlined text was not made into an acronym. The underline + italics and underline + bold cases were, but only if the underline markup was on the *outside* of the bold/italics markup. Curiously enough, the underline + bold + italics case works as long as the underline is not the innermost markup element.

My guess is that the parser is allowing underscores in an acronym and is interpreting the double-underscore as part of the acronym (thus it doesn't match anything in the acronym list so it doesn't get tagged). Stripping off leading and trailing non-alphanumeric characters before comparing the text to the acronym list should fix this problem, but I haven't tried to patch it myself so I can't say for sure.

Attachments (0)

Change History (8)

comment:1 Changed 4 years ago by rjollos

  • Description modified (diff)
  • Priority changed from normal to high
  • Status changed from new to assigned

comment:2 in reply to: ↑ description Changed 4 years ago by rjollos

Replying to AllenB:

My guess is that the parser is allowing underscores in an acronym and is interpreting the double-underscore as part of the acronym (thus it doesn't match anything in the acronym list so it doesn't get tagged). Stripping off leading and trailing non-alphanumeric characters before comparing the text to the acronym list should fix this problem, but I haven't tried to patch it myself so I can't say for sure.

Thank you for the detailed report. I'm surprised by this, given [9662]. I'll investigate now and see if a quick fix can be made.

comment:3 Changed 4 years ago by rjollos

(In [9740]) Strip trailing whitespace when parsing the AcronymDefinitions page. Previously, whitespace after the end of a row in the table would prevent that row from being parsed. Refs #8377.

comment:4 follow-up: Changed 4 years ago by rjollos

This issue here appears to be similar to what I found in comment:1:ticket:8267.

We implement the IWikiSyntaxProvider method:

    # IWikiSyntaxProvider methods
    def get_wiki_syntax(self):
        if self.compiled_acronyms:
            yield (self.compiled_acronyms, self._acronym_formatter)

__SCSI__ is not passed to the callback _acronym_formatter, so it is not being matched to compiled_acronyms. I'm not sure if this an internal issue with Trac, or if I can somehow modify compiled_acronyms, which for my test page looks like:

\b(?P<acronym>RFC2316|SCSI|ROM|URL|RFC)(?P<acronymselector>\w*)\b

comment:5 Changed 4 years ago by rjollos

  • Priority changed from high to normal

See also #857. I suspect all these tickets (#857, #8267, #8377) are related.

Since I'm stuck on this at the moment and have a number of other tickets that I know how to fix, I'm dropping the priority and will keep an eye out for a solution as I study the Trac source code.

If you want to do any additional investigation, I'll take quick action on any hints or a patch.

comment:6 in reply to: ↑ 4 Changed 4 years ago by AllenB

Replying to rjollos:

__SCSI__ is not passed to the callback _acronym_formatter, so it is not being matched to compiled_acronyms. I'm not sure if this an internal issue with Trac, or if I can somehow modify compiled_acronyms, which for my test page looks like:

\b(?P<acronym>RFC2316|SCSI|ROM|URL|RFC)(?P<acronymselector>\w*)\b

If Trac isn't passing the text to the callback, then I don't think that it's an error in the plugin. You might be able to work around it, however.

I haven't delved too deeply into the Trac source regarding this, but I suspect that the Trac source uses a similar regular expression. Since the \w character class is equivalent to [A-Za-z0-9_], the regular expression will pick up underscores as a valid part of the word. You might be able to work around this Trac behavior by slightly modifying the _update_acronyms method. Whenever you add an acronym into the self.compiled_acronyms list, add the "underlined version" of the acronym as well. If I'm understanding the source correctly, this would mean changing line 34 from:

self.acronyms[a] = (escape(d), escape(u), escape(s))

to something similar to:

self.acronyms[a] = (escape(d), escape(u), escape(s))
a_2 = "__%s__" % a
self.acronyms[a_2] = (escape(d), escape(u), escape(s))

The drawback is that this would double the length of the self.compiled_acronyms list and would cause Trac to spend more time processing acronyms. You would probably also want something more intelligent for constructing a_2 that will first verify that the acronym doesn't already have leading or trailing underscores or that the list doesn't already contain an acronym with that name.

I haven't had a chance to test this myself, so it's merely a conjecture at this point.

comment:7 Changed 4 years ago by rjollos

  • Summary changed from Underlined acronyms aren't always converted to Valid acronyms with underlined wiki markup are not tagged as acronyms

comment:8 Changed 15 months ago by rjollos

  • Status changed from assigned to new

Add Comment

Modify Ticket

Action
as new .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.