Opened 3 years ago

Closed 2 years ago

# better separator handling and case-insensitive

Reported by: Owned by: clemens clemens normal TracKeywordsPlugin normal 1.2

Hello

I propose and provide some moderate improvements to the trac_keywords.js file of the TracKeywordsPlugin.

# Summary

1. allow arbitrary keyword separators
2. better handling for multiple spaces and separators
3. keyword matching is case-insensitive
4. avoid duplicate insertion of keyword section

# Separators

The idea is not only to allow separation of keywords by SPACE and coma as previously, but also semi-colon, dash etc. I make use of the \W regex term to match all "non-word" characters as separators.

My changes also provide better handling in cases where multiple separators are used, for example bla,,,,blo.

# Case-insensitive

The previous version did not consider a keyword if the case did not match, i.e. it was case-sensitive. One can discuss about it, but for me it is more useable and convenient if it behaves case-insensitive. The keywords BLA, bla, Bla and bLa should be treated as the same.

I am aware that other people may not share my opinion and might like to have case-sensitive. In this case it is easy to have a case-sensitive code variant. Look for my code comments about case-insensitive and toLowerCase and remove the corresponding code.

# Avoid Duplicates

I added a simple check to avoid duplicate insertion of the keyword section. This is because I came across a duplicate when I used the browser "save"-feature. The saved HTML page ended up having with two keyword sections (one was static HTML, the other dynamic from JS). My simple check avoids this.

# Test case

The following test case uses:

• upper and lower case keywords
• numbers and letters
• different separators
• multiple separators
• trailing separators

power, Other A2, C3, double double more double, slave;radio,.+/:other     eol-----epa.....


Clemens

### Changed 3 years ago by clemens

updated source code

patch file

### comment:1 follow-ups:  2  3 Changed 3 years ago by Jun Omae

Incorrect patch. At least, uses of \W break keyword with unicode characters. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions.

### comment:2 in reply to:  1 Changed 3 years ago by anonymous

Incorrect patch. At least, uses of \W break keyword with unicode characters.

Dear Jun

Can you please give me your test cases (especially those with unicode.

Is it possible or not to use Unicode in the trac.ini?
In other words will one be able to define unicode keywords for the TracKeywordsPlugin or not?

Indeed I was not aware about Unicode and the fact that for Javascript \W matches any non-word character. Equivalent to [^A-Za-z0-9_]. Unicode characters will appear as "non-word character".

I will think about a solution. My idea about \W was to match as separator. I may replace it with a dedicated collection of separators, something like [ ;,-/.:|]. I'll have to test this...

Clemens

### comment:3 in reply to:  1 Changed 3 years ago by clemens

Jun pointed out that my above patch from 2019-02-14 does not work with unicode. The main "problem" is that for Javascript \W (i.e. non-word characters) matches also on all unicode characters.

I have reworked (and tested) my code and my new patch is not using \W but the following characters as keyword separators: [ .,;:|]

The TracKeywordsPlugin now handles cases properly if unicode characters appear in the keyword field.

The new test cases also include unicode:

• upper and lower case keywords
• numbers and letters
• different separators
• multiple separators
• trailing separators
• umlauts
• unicode

 power, Other A2|C3, Ωףۺ,😮, 😲, double double more double, Umläüts, slave;radio, .+/:other eol-----epa.....

Be aware that the TracKeywordsPlugin does not really support to declare keywords (in the trac.ini file) which contain special unicode characters. For example you cannot use the TracKeywordsPlugin to declare a "smily" 😲 as toggle field. However, this restriction is not due to my patch, it has been there also before. In fact I am not even sure if it is possible to use unicode in the trac.ini file.

### Changed 3 years ago by clemens

reworked source code for better handling of unicode

### Changed 3 years ago by clemens

reworked source code for better handling of unicode (correct file name)

### Changed 3 years ago by clemens

reworked patch file for better handling of unicode

### comment:4 Changed 3 years ago by Ryan J Ollos

Owner: set to Ryan J Ollos new → accepted

### comment:5 Changed 3 years ago by Jun Omae

The patch has the following line:

    var newval = orig.replace(new RegExp('\\b' + w + '\\b'), '');


However, the replacement works unexpectedly.

> 'blah tracking trac-hacks trac hacks'.replace(/\btrac\b/,'')
< "blah tracking -hacks trac hacks"


I think the line should be like:

    var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');  Another thing, w should be escaped if RegExp is used. ### Changed 3 years ago by clemens Attachment: trac_keywords_20190307.diff​ added repaired patch file (sorry my previous one was inversed) ### comment:6 follow-up: 7 Changed 3 years ago by clemens Hello Jun and Ryan Thanks for looking at my patch. I have to aplogize, that I made a odd mistake: By mistake my patchfile attachment:trac_keywords_20190303.diff is inversed, please do not use. Instead please use these files: This is the original code: var newval = orig.replace(new RegExp('\\b' + w + '\\b'), '');  ... this the patched code: var newval = orig.replace(new RegExp('[ .,;:|]*' + w + '\\b','i'), '');  Sorry for mixing up things! Some comments about your feedback...  var newval =$.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');


As far as I understand this "split-join" code it works as follows:

1. split the text at pre-defined separators ;,\s
2. remove the searched keyword by replacing any occurance with "nothing"
3. re-assemble the text using SPACE as separator

This is rather clever. But I am not sure if we should change the keyword text globally. Who says that SPACE is the "right" separator? People may like to use their own separator characters. I like SPACE, but for example the TRAC ticket batch-modify feature inserts commas for keyword modification.
The old approach on the other side modifies the the keyword text only locally i.e. only as much as necessary. All other parts of the text remain untouched - no matter how confuse they are.

The minus/hyphen character - is not part of my proposed separator list [ .,;:|]. We may add it. My approach was to use only a very restricted set of characters, which leave no doubt that they are intendet to be keyword separators. I am not sure if it is recommended to use keywords with hyphen e.g. trac-hacks. To avoid difficulties with ticket queries, my personal habit is to use only single word keywords, avoid hyphenated words like "trac-hack" or even phrases like "version control". I may add some more test cases to the TracKeywordsPlugin.

If a keyword appears double or multiple times (like C3, double double more double in my test case), then my proposed plug-in code will remove only one after the other one, not all at once. I believe this is a feature. People may have a crazy reason for why to repeat the keyword. However, it does not make much sense to use double or multiple keywords - it may be a rare special case.

I am not an JavaScript expert, but I am not convinced that w need to be escaped in this line:

var newval = orig.replace(new RegExp('[ .,;:|]*' + w + '\\b','i'), '');


w is a JavaScript variable and not a string literal like '\\b'. Also in the original code (without my patch) w appears non-escaped and it works.

Clemens

  var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');  As far as I understand this "split-join" code it works as follows: 1. split the text at pre-defined separators ;,\s 2. remove the searched keyword by replacing any occurance with "nothing" 3. re-assemble the text using SPACE as separator This is rather clever. But I am not sure if we should change the keyword text globally. Who says that SPACE is the "right" separator? At least, Trac splits the keywords using [;,\s]+ and renders the words which are concatenated with space character at trac:source:/tags/trac-1.2.3/trac/ticket/web_ui.py@:1529,1531#L1521. I am not an JavaScript expert, but I am not convinced that w need to be escaped in this line: var newval = orig.replace(new RegExp('[ .,;:|]*' + w + '\\b','i'), '');  w is a JavaScript variable and not a string literal like '\\b'. Also in the original code (without my patch) w appears non-escaped and it works. I don't consider \b should be used. 1. Add the following words to [keywords] section in trac.ini. [keywords] trac = trac-hacks = hacks =  2. Visit newticket page. 3. Click trac-hacks in Add Keywords • keywords field should be trac-hacks 4. Click hacks in Add Keywords • keywords field should be trac-hacks hacks but trac-. Reason to escape the w variable: 1. Add svn+ssh to [keywords] section. 2. Visit newticket page. 3. Click svn+ssh in Add Keywords • svn+ssh should be toggled but the keyword is repeatedly added. Patch to escape: • ## trackeywords/htdocs/trac_keywords.js  old var el = document.getElementById(field_id); var orig = el.value; // remove the keyword including white spaces and separator, match case-insensitive var newval = orig.replace(new RegExp('[ .,;:|]*' + w + '\\b','i'), ''); var newval = orig.replace(new RegExp('[ .,;:|]*' + regexp_escape(w) + '\\b','i'), ''); var link = document.getElementById('trac-keyword-' + w); if (orig != newval) { // remove tag. if(link) link.className = ''; el.value = newval.replace(/^[ .,;:|]+|[ .,;:|]+$/, ''); } function regexp_escape(text) { // Escape characters except hyphon, comma, underscore, 0-9, A-Z and a-z return text.replace(/[^-,_0-9A-Za-z]/g, '\\$&'); } var$fieldset = $(' ');$fieldset.append('Add Keywords') $ul =$('
');

### comment:8 in reply to:  7 ; follow-up:  9 Changed 3 years ago by clemens

Hello Jun

Thanks for your explanation about the escape thing. I have integrated your regexp_escape() function. And it works fine.

I have also inserted your "split-join" code. Basically it works, but see my issue about case-insensitive below. We could remove the regexp_escape() function again.

I have changed the code to allow only [;,\s] as separators. We can negotiate about more separators. I am happy with those. Things like trac-hacks or svn+ssh work now.

We may explain for our users in the Wiki plug-in documentation,

• that the plug-in accepts [;,\s] as separators,
• that the plug-in re-assembles the text and replaces all separators with SPACE. This may be seen as feature not as problem.

There is one last thing which is very important for me: case-insensitive. Your "split-join" code is not case-insensitive. Can you please propose an advanced version, which (at least optionally) works case-insensitive?

  var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');  Clemens ### comment:9 in reply to: 8 ; follow-up: 10 Changed 3 years ago by Jun Omae Replying to clemens: There is one last thing which is very important for me: case-insensitive. Your "split-join" code is not case-insensitive. Can you please propose an advanced version, which (at least optionally) works case-insensitive?  var newval =$.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');


Could you please try the following changes?

-    var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' '); + var filter = function(v) { + return v.length !== w.length || v.toLowerCase() !== w.toLowerCase(); + }; + var newval =$.grep(orig.split(/[;,\s]+/), filter).join(' ');


### Changed 3 years ago by clemens

works well now

### Changed 3 years ago by clemens

works well now

### Changed 3 years ago by clemens

removed regexp_escape function

### Changed 3 years ago by clemens

removed regexp_escape function

### comment:10 in reply to:  9 Changed 3 years ago by clemens

Could you please try the following changes?

-    var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' '); + var filter = function(v) { + return v.length !== w.length || v.toLowerCase() !== w.toLowerCase(); + }; + var newval =$.grep(orig.split(/[;,\s]+/), filter).join(' ');


Thanks, I tested your improved "split-join" code and it works very well! Keywords are now handled case-insensitive.

I removed the regexp_escape() function. We do not need it anymore.

  function regexp_escape(text) {
// Escape special regex characters (except hyphen, comma, underscore, 0-9, A-Z and a-z)
return text.replace(/[^-,_0-9A-Za-z]/g, '\\\$&');
}


Here is the new code:

Please find below the summary of improvement which we reached so far. In my opinion a major step forward:

1. Whitespace, comma and semicolom i.e. [;,\s] are allowed as keyword separators.
(Previously only SPACE worked well.)
2. Keyword matching is case-insensitive now. This is a real advantage, because often users use their arbitary case stlye, especially for acronyms.
(Previously "HTML" and "html" and "Html" were treated differently.)
3. Better handling for multiple spaces and separators.
(Previously things like bla, , blo were not treated well.)
4. Keywords with special characters (unicode and umlauts) are supported now. Great improvement for many non-English users.
(Previously the use of \b regex operator did not work well with special characters.)
5. Avoid duplicate insertion of keyword section. This is useful if the ticket is saved to static HTML file.
(Previously the "keyword" section appeared twice in a static HTML file.)
6. More code comments in scope of good engineering practice.

There is one last thing, which might be desireable: It would be nice if the keyword list ("trac-keywords" fieldset) could auto-update if someone would manually edit the keyword field. However, I have no idea how to accomplish this and I am afraid that this feature would require quite much effort. I can pretty well live without it. Unless somebody has a brilliant idea I would propose to continue with the existing improvements...

### comment:11 Changed 3 years ago by Ryan J Ollos

No need to attach the source file, just the diff is all we need.

### comment:12 Changed 2 years ago by clemens

Hello

The new plug-in code is running on our local TRAC installation. It works well. Everybody is happy with what we have reached.

Is there anything more I can help to promote this ticket?

Clemens

### comment:13 Changed 2 years ago by Ryan J Ollos

Description: modified (diff)

### comment:14 Changed 2 years ago by Ryan J Ollos

Resolution: → fixed accepted → closed

In 17396:

TracKeywordsPlugin 1.0dev: Improve separator handling

Patch by clemens.

Fixes #13534.

### comment:15 Changed 2 years ago by Ryan J Ollos

Owner: changed from Ryan J Ollos to clemens

### Modify Ticket

Change Properties