#13534 closed enhancement (fixed)
better separator handling and case-insensitive
Reported by: | clemens | Owned by: | clemens |
---|---|---|---|
Priority: | normal | Component: | TracKeywordsPlugin |
Severity: | normal | Keywords: | |
Cc: | Trac Release: | 1.2 |
Description (last modified by )
Hello
I propose and provide some moderate improvements to the trac_keywords.js file of the TracKeywordsPlugin.
Summary
- allow arbitrary keyword separators
- better handling for multiple spaces and separators
- keyword matching is case-insensitive
- avoid duplicate insertion of keyword section
- add some comments
Patch
Separators
The idea is not only to allow separation of keywords by SPACE and coma as previously,
but also semi-colon, dash etc.
I make use of the \W
regex term to match all "non-word" characters as separators.
My changes also provide better handling in cases where multiple separators are used, for example bla,,,,blo
.
Case-insensitive
The previous version did not consider a keyword if the case did not match, i.e. it was case-sensitive.
One can discuss about it, but for me it is more useable and convenient if it behaves case-insensitive.
The keywords BLA
, bla
, Bla
and bLa
should be treated as the same.
I am aware that other people may not share my opinion and might like to have case-sensitive.
In this case it is easy to have a case-sensitive code variant.
Look for my code comments about case-insensitive
and toLowerCase
and remove the corresponding code.
Avoid Duplicates
I added a simple check to avoid duplicate insertion of the keyword section. This is because I came across a duplicate when I used the browser "save"-feature. The saved HTML page ended up having with two keyword sections (one was static HTML, the other dynamic from JS). My simple check avoids this.
Test case
The following test case uses:
- upper and lower case keywords
- numbers and letters
- different separators
- multiple separators
- trailing separators
power, Other A2, C3, double double more double, slave;radio,.+/:other eol-----epa.....
Clemens
Attachments (12)
Change History (27)
Changed 6 years ago by
Attachment: | trac_keywords.js added |
---|
comment:1 follow-ups: 2 3 Changed 6 years ago by
Incorrect patch. At least, uses of \W
break keyword with unicode characters.
See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions.
comment:2 Changed 6 years ago by
Replying to Jun Omae:
Incorrect patch. At least, uses of
\W
break keyword with unicode characters.
Dear Jun
Can you please give me your test cases (especially those with unicode.
Is it possible or not to use Unicode in the trac.ini
?
In other words will one be able to define unicode keywords for the TracKeywordsPlugin or not?
Indeed I was not aware about Unicode and the fact that for Javascript \W
matches any non-word character. Equivalent to [^A-Za-z0-9_]
. Unicode characters will appear as "non-word character".
I will think about a solution. My idea about \W
was to match as separator. I may replace it with a dedicated collection of separators, something like [ ;,-/.:|]
. I'll have to test this...
Clemens
comment:3 Changed 6 years ago by
About Unicode
Jun pointed out that my above patch from 2019-02-14
does not work with unicode.
The main "problem" is that for Javascript \W
(i.e. non-word characters) matches also on all unicode characters.
I have reworked (and tested) my code and my new patch is not using \W
but the following characters as keyword separators: [ .,;:|]
The TracKeywordsPlugin now handles cases properly if unicode characters appear in the keyword field.
The new test cases also include unicode:
- upper and lower case keywords
- numbers and letters
- different separators
- multiple separators
- trailing separators
- umlauts
- unicode
power, Other A2|C3, Ωףۺ,😮, 😲, double double more double, Umläüts, slave;radio, .+/:other eol-----epa.....
Be aware that the TracKeywordsPlugin does not really support to declare keywords (in the trac.ini
file) which contain special unicode characters. For example you cannot use the TracKeywordsPlugin to declare a "smily" 😲 as toggle field. However, this restriction is not due to my patch, it has been there also before. In fact I am not even sure if it is possible to use unicode in the trac.ini
file.
Changed 6 years ago by
Attachment: | trac_keywords_03032019.js added |
---|
reworked source code for better handling of unicode
Changed 6 years ago by
Attachment: | trac_keywords_20190303.js added |
---|
reworked source code for better handling of unicode (correct file name)
Changed 6 years ago by
Attachment: | trac_keywords_20190303.diff added |
---|
reworked patch file for better handling of unicode
comment:4 Changed 6 years ago by
Owner: | set to Ryan J Ollos |
---|---|
Status: | new → accepted |
comment:5 Changed 6 years ago by
The patch has the following line:
var newval = orig.replace(new RegExp('\\b' + w + '\\b'), '');
However, the replacement works unexpectedly.
> 'blah tracking trac-hacks trac hacks'.replace(/\btrac\b/,'') < "blah tracking -hacks trac hacks"
I think the line should be like:
var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');
Another thing, w
should be escaped if RegExp
is used.
Changed 6 years ago by
Attachment: | trac_keywords_20190307.diff added |
---|
repaired patch file (sorry my previous one was inversed)
comment:6 follow-up: 7 Changed 6 years ago by
Hello Jun and Ryan
Thanks for looking at my patch.
I have to aplogize, that I made a odd mistake:
By mistake my patchfile attachment:trac_keywords_20190303.diff is inversed, please do not use.
Instead please use these files:
This is the original code:
var newval = orig.replace(new RegExp('\\b' + w + '\\b'), '');
... this the patched code:
var newval = orig.replace(new RegExp('[ .,;:|]*' + w + '\\b','i'), '');
Sorry for mixing up things!
Some comments about your feedback...
var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');
As far as I understand this "split-join" code it works as follows:
- split the text at pre-defined separators
;,\s
- remove the searched keyword by replacing any occurance with "nothing"
- re-assemble the text using SPACE as separator
This is rather clever. But I am not sure if we should change the keyword text globally. Who says that SPACE is the "right" separator? People may like to use their own separator characters. I like SPACE, but for example the TRAC ticket batch-modify feature inserts commas for keyword modification.
The old approach on the other side modifies the the keyword text only locally i.e. only as much as necessary. All other parts of the text remain untouched - no matter how confuse they are.
The minus/hyphen character -
is not part of my proposed separator list [ .,;:|]
. We may add it. My approach was to use only a very restricted set of characters, which leave no doubt that they are intendet to be keyword separators.
I am not sure if it is recommended to use keywords with hyphen e.g. trac-hacks
. To avoid difficulties with ticket queries, my personal habit is to use only single word keywords, avoid hyphenated words like "trac-hack" or even phrases like "version control". I may add some more test cases to the TracKeywordsPlugin.
If a keyword appears double or multiple times (like C3, double double more double
in my test case), then my proposed plug-in code will remove only one after the other one, not all at once. I believe this is a feature. People may have a crazy reason for why to repeat the keyword. However, it does not make much sense to use double or multiple keywords - it may be a rare special case.
I am not an JavaScript expert, but I am not convinced that w
need to be escaped in this line:
var newval = orig.replace(new RegExp('[ .,;:|]*' + w + '\\b','i'), '');
w
is a JavaScript variable and not a string literal like '\\b'
. Also in the original code (without my patch) w
appears non-escaped and it works.
Clemens
comment:7 follow-up: 8 Changed 6 years ago by
Some comments about your feedback...
var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');As far as I understand this "split-join" code it works as follows:
- split the text at pre-defined separators
;,\s
- remove the searched keyword by replacing any occurance with "nothing"
- re-assemble the text using SPACE as separator
This is rather clever. But I am not sure if we should change the keyword text globally. Who says that SPACE is the "right" separator?
At least, Trac splits the keywords using [;,\s]+
and renders the words which are concatenated with space character at trac:source:/tags/trac-1.2.3/trac/ticket/web_ui.py@:1529,1531#L1521.
I am not an JavaScript expert, but I am not convinced that
w
need to be escaped in this line:var newval = orig.replace(new RegExp('[ .,;:|]*' + w + '\\b','i'), '');
w
is a JavaScript variable and not a string literal like'\\b'
. Also in the original code (without my patch)w
appears non-escaped and it works.
I don't consider \b
should be used.
- Add the following words to
[keywords]
section in trac.ini.[keywords] trac = trac-hacks = hacks =
- Visit newticket page.
- Click
trac-hacks
in Add Keywords- keywords field should be
trac-hacks
- keywords field should be
- Click
hacks
in Add Keywords- keywords field should be
trac-hacks hacks
buttrac-
.
- keywords field should be
Reason to escape the w
variable:
- Add
svn+ssh
to[keywords]
section. - Visit newticket page.
- Click
svn+ssh
in Add Keywordssvn+ssh
should be toggled but the keyword is repeatedly added.
Patch to escape:
-
trackeywords/htdocs/trac_keywords.js
old new 7 7 var el = document.getElementById(field_id); 8 8 var orig = el.value; 9 9 // remove the keyword including white spaces and separator, match case-insensitive 10 var newval = orig.replace(new RegExp('[ .,;:|]*' + w+ '\\b','i'), '');10 var newval = orig.replace(new RegExp('[ .,;:|]*' + regexp_escape(w) + '\\b','i'), ''); 11 11 var link = document.getElementById('trac-keyword-' + w); 12 12 if (orig != newval) { // remove tag. 13 13 if(link) link.className = ''; … … 19 19 el.value = newval.replace(/^[ .,;:|]+|[ .,;:|]+$/, ''); 20 20 } 21 21 22 function regexp_escape(text) { 23 // Escape characters except hyphon, comma, underscore, 0-9, A-Z and a-z 24 return text.replace(/[^-,_0-9A-Za-z]/g, '\\$&'); 25 } 26 22 27 var $fieldset = $('<fieldset id="trac-keywords" />'); 23 28 $fieldset.append('<legend>Add Keywords</legend>') 24 29 $ul = $('<ul></ul>');
Changed 6 years ago by
Attachment: | trac_keywords_20190308.diff added |
---|
Changed 6 years ago by
Attachment: | trac_keywords_20190308.js added |
---|
comment:8 follow-up: 9 Changed 6 years ago by
Hello Jun
Thanks for your explanation about the escape thing. I have integrated your regexp_escape()
function. And it works fine.
I have also inserted your "split-join" code. Basically it works, but see my issue about case-insensitive below.
We could remove the regexp_escape()
function again.
- attachment:trac_keywords_20190308.diff
- attachment:trac_keywords_20190308.js
- new test case:
power, Other A2 C3, Ωףۺ,😮, 😲, double double more double, , word-with-hyphens, trac-hacks, svn+ssh, Umläüts, slave;radio, .+/:other no|separator..... eol-----epa trailing-space
I have changed the code to allow only [;,\s]
as separators. We can negotiate about more separators. I am happy with those.
Things like trac-hacks
or svn+ssh
work now.
We may explain for our users in the Wiki plug-in documentation,
- that the plug-in accepts
[;,\s]
as separators, - that the plug-in re-assembles the text and replaces all separators with SPACE. This may be seen as feature not as problem.
There is one last thing which is very important for me: case-insensitive. Your "split-join" code is not case-insensitive. Can you please propose an advanced version, which (at least optionally) works case-insensitive?
var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');
Clemens
comment:9 follow-up: 10 Changed 6 years ago by
Replying to clemens:
There is one last thing which is very important for me: case-insensitive. Your "split-join" code is not case-insensitive. Can you please propose an advanced version, which (at least optionally) works case-insensitive?
var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' ');
Could you please try the following changes?
- var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' '); + var filter = function(v) { + return v.length !== w.length || v.toLowerCase() !== w.toLowerCase(); + }; + var newval = $.grep(orig.split(/[;,\s]+/), filter).join(' ');
Changed 6 years ago by
Attachment: | trac_keywords_20190309b.diff added |
---|
removed regexp_escape
function
comment:10 Changed 6 years ago by
Replying to Jun Omae:
Could you please try the following changes?
- var newval = $.grep(orig.split(/[;,\s]+/), function(v) { return v !== w }).join(' '); + var filter = function(v) { + return v.length !== w.length || v.toLowerCase() !== w.toLowerCase(); + }; + var newval = $.grep(orig.split(/[;,\s]+/), filter).join(' ');
Thanks, I tested your improved "split-join" code and it works very well! Keywords are now handled case-insensitive.
I removed the regexp_escape()
function. We do not need it anymore.
function regexp_escape(text) { // Escape special regex characters (except hyphen, comma, underscore, 0-9, A-Z and a-z) return text.replace(/[^-,_0-9A-Za-z]/g, '\\$&'); }
Here is the new code:
- attachment:trac_keywords_20190309b.diff
- attachment:trac_keywords_20190309b.js
- most recent test case:
power, Other a2 C3, Ωףۺ,😮, 😲, ☞uni⚠code☯, double double more double, , word-with-hyphens, trac-hacks, svn+ssh, Umläüts, slave;radio, .+/:other no|separator..... eol-----epa trailing-space
Please find below the summary of improvement which we reached so far. In my opinion a major step forward:
- Whitespace, comma and semicolom i.e.
[;,\s]
are allowed as keyword separators.
(Previously only SPACE worked well.) - Keyword matching is case-insensitive now. This is a real advantage, because often users use their arbitary case stlye, especially for acronyms.
(Previously "HTML" and "html" and "Html" were treated differently.) - Better handling for multiple spaces and separators.
(Previously things likebla, , blo
were not treated well.) - Keywords with special characters (unicode and umlauts) are supported now. Great improvement for many non-English users.
(Previously the use of\b
regex operator did not work well with special characters.) - Avoid duplicate insertion of keyword section. This is useful if the ticket is saved to static HTML file.
(Previously the "keyword" section appeared twice in a static HTML file.) - More code comments in scope of good engineering practice.
There is one last thing, which might be desireable: It would be nice if the keyword list ("trac-keywords" fieldset) could auto-update if someone would manually edit the keyword field. However, I have no idea how to accomplish this and I am afraid that this feature would require quite much effort. I can pretty well live without it. Unless somebody has a brilliant idea I would propose to continue with the existing improvements...
comment:12 Changed 6 years ago by
Hello
The new plug-in code is running on our local TRAC installation. It works well. Everybody is happy with what we have reached.
Is there anything more I can help to promote this ticket?
Clemens
comment:13 Changed 5 years ago by
Description: | modified (diff) |
---|
comment:15 Changed 5 years ago by
Owner: | changed from Ryan J Ollos to clemens |
---|
updated source code