Modify

Opened 5 years ago

Last modified 2 years ago

#5938 new defect

Queries containing '.' return 0 results from RepoSearch

Reported by: bugs@… Owned by: rjollos
Priority: high Component: RepoSearchPlugin
Severity: normal Keywords:
Cc: Trac Release: 0.11

Description

A search query that contains '.' will not return results from the repository. For example, a file that has been indexed contains the string 'a.smith', as does a ticket entry. Searching for 'a.smith' will return only a hit from the ticket, not the repo. Searching for 'smith' returns hits from both the repo and the tickets. My repo consists entirely of router configurations so being able to submit queries with .'s in them [eg IP addresses] is a rather important bit of functionality.

Attachments (1)

indexer.py.patch (357 bytes) - added by fernan 4 years ago.
suggested change in regexp

Download all attachments as: .zip

Change History (11)

comment:1 Changed 5 years ago by athomas

Tokenisation is governed by tracreposearch.indexer.Indexer._strip, you'll have to modify the source to include "." as a valid word token.

This would probably be useful to expose as a config option.

comment:2 Changed 4 years ago by rjollos

  • Owner changed from athomas to rjollos

Reassigning ticket after changing maintainer of plugin.

comment:3 Changed 4 years ago by fernan

  • Priority changed from normal to high

This IS important.

My repository is full of SQL-like code that contains table names and column names, which are separated by dots, e.g. 'table.column'.

Also, a dot is an important operator in many programming languages. Searching for tokens separated with dots (e.g. object.action()) is a common task that IMO should be supported in searches.

In both these examples it's not possible to replace the search terms:

  • searching for 'table' or 'object' alone is not useful
  • searching for 'column' or 'action' alone is also not useful

there are zillions of lines in my code referring to these items alone, I only want those lines containing both tokens, and preferrably I want these tokens joined together with a dot, so that I'm sure of getting functionally relevant results.

BTW: thanks for the plugin!

comment:4 Changed 4 years ago by rjollos

My opinion is that this plugin is very broken, which is why I don't expect to get at this issue for some time. If you look at the other open tickets, there are some major show-stoppers. However, I'd be happy to apply a patch if you generate one.

comment:5 follow-up: Changed 4 years ago by fernan

Replying to rjollos:

My opinion is that this plugin is very broken, which is why I don't expect to get at this issue for some time. If you look at the other open tickets, there are some major show-stoppers. However, I'd be happy to apply a patch if you generate one.

Hmm, I can see what you mean.

I edited 'tracreposearch.indexer.Indexer._strip' as suggested and changed the regular expression from '\w+' to '\S+', and reindexed. Now, searches kind of work.

i) Searching for 'table.column' is now OK (sometimes).

ii) But searching for Perl modules (which use '::' as a separator) don't (no matches found), e.g. GD::Graph, Data::Dumper, etc. I can't see why ... \S+ is supposed to match any non whitespace character. And that includes '::'. In fact, the regexp works fine in other contexts (perl, vim).

Anyway, as you say, the plugin seems to be broken anyway because intermittenly, running a search would produce an error, for no apparent reason:

  • Running a search for 'Data::Dumper' produces no results, as mentioned
  • However, searching for 'Dumper' alone (without the quotes) produces the following error (note the added semicolon after the word):
Oops…
Trac detected an internal error:
KeyError: 'dumper;'
This is probably a local installation issue.

Python Traceback

Most recent call last:
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/web/main.py", line 450, in _dispatch_request
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/web/main.py", line 206, in dispatch
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/search/web_ui.py", line 107, in process_request
File "build/bdist.freebsd-6.4-RELEASE-p10-i386/egg/tracreposearch/search.py", line 112, in get_search_results
File "build/bdist.freebsd-6.4-RELEASE-p10-i386/egg/tracreposearch/search.py", line 82, in <lambda>
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 97, in wrap
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 280, in find_words
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 42, in __getitem__
  • Running a search for 'Genes' (there's a perl module in my code named 'weight::Genes') produced the following error:
Oops…
Trac detected an internal error:
KeyError: "$gdi_genes_rs=$c->model('weight"
This is probably a local installation issue.

Python Traceback

Most recent call last:
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/web/main.py", line 450, in _dispatch_request
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/web/main.py", line 206, in dispatch
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/search/web_ui.py", line 107, in process_request
File "build/bdist.freebsd-6.4-RELEASE-p10-i386/egg/tracreposearch/search.py", line 112, in get_search_results
File "build/bdist.freebsd-6.4-RELEASE-p10-i386/egg/tracreposearch/search.py", line 82, in <lambda>
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 97, in wrap
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 280, in find_words
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 42, in __getitem__

Maybe the KeyError has some clue about what's going on?

Regarding your suggestion of me providing a patch, I for one would suggest to change the original regexp to '\S+'. That would be my patch.

Regarding the other problems I'm not proficient in Python so I can't offer much help, but I can certainly help debug reposearch ...

Cheers,

--
fernan

Changed 4 years ago by fernan

suggested change in regexp

comment:6 in reply to: ↑ 5 Changed 4 years ago by rjollos

  • Status changed from new to assigned

Replying to fernan:

Regarding the other problems I'm not proficient in Python so I can't offer much help, but I can certainly help debug reposearch ...

I'll take a look at the patch short, and will definitely take you up on that offer to do some testing and debugging!

comment:7 Changed 4 years ago by rjollos

Thanks for this hint, it helped me with #8266 as well.

comment:8 Changed 4 years ago by rjollos

(In [9663]) Use \S in the regular expression that extracts words. \S will match any non-whitespace character, whereas \w only matches alphanumeric characters and the underscore. Refs #5938.

comment:9 Changed 3 years ago by rjollos

Hello,

I took over maintainership of this plugin from athomas some time ago. There is a significant amount of work to do on this plugin, and I don't foresee having the time to do it all.

helend has written the TracSuposePlugin, which seems like a much better solution. Rather than writing the repository search functionality from scratch, a Trac interface to an existing repository search tool has been created. Rather than throwing more effort at this plugin, I'd prefer to help helend with enhancements to the TracSuposePlugin, or spend my time on other Trac plugin projects altogether.

I'd like to get some feedback and hear if anyone knows of a compelling reason to continue this project rather than moving to the TracSuposePlugin. Is there functionality in this plugin that doesn't exist in the TracSuposePlugin? I'm open to hearing all opinions and suggestions.

I'll leave these tickets open for about a week, but in all likelihood will close all of them and deprecate the plugin.

Thanks for your time,

  • Ryan

comment:10 Changed 2 years ago by rjollos

  • Status changed from assigned to new

Add Comment

Modify Ticket

Action
as new .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.