Modify

Opened 15 years ago

Closed 13 years ago

Last modified 13 years ago

#7046 closed defect (wontfix)

Support multi-repository in 0.12

Reported by: victor Owned by: Ryan J Ollos
Priority: normal Component: RepoSearchPlugin
Severity: normal Keywords:
Cc: Trac Release: 0.12

Description

Can anybody implement support multireport in trac 0.12?

Attachments (0)

Change History (10)

comment:1 Changed 15 years ago by Ryan J Ollos

There is not much hope of this happening anytime soon. I adopted this plugin in hopes of getting it to a usable state, but this is bound to be a very time consuming project and it looks like we need to start from scratch again and plan a project that can overcome the limitations of this plugin.

comment:2 Changed 14 years ago by Ryan J Ollos

You might take a look at #7545, for which a 0.12 patch was just committed, and discuss with the reporter of that ticket whether it is working now under 0.12.

comment:3 Changed 14 years ago by anonymous

We have this plugin working successfully under 0.12 with the #7545 patch, using just a single "default" repository (upgraded from trac 0.11). The multi-repo support adds new arguments to get_repository():

def get_repository(self, reponame=None, authname=None):

So it would seem you could search multiple repositories too, as long as you have permissions. Thanks for the quick commit, btw!

comment:4 Changed 14 years ago by Ryan J Ollos

Summary: Support multireport in trac 0.12Support multi-repository in 0.12

comment:5 Changed 13 years ago by Ryan J Ollos

Hello,

I took over maintainership of this plugin from athomas some time ago. There is a significant amount of work to do on this plugin, and I don't foresee having the time to do it all.

helend has written the TracSuposePlugin, which seems like a much better solution. Rather than writing the repository search functionality from scratch, a Trac interface to an existing repository search tool has been created. Rather than throwing more effort at this plugin, I'd prefer to help helend with enhancements to the TracSuposePlugin, or spend my time on other Trac plugin projects altogether.

I'd like to get some feedback and hear if anyone knows of a compelling reason to continue this project rather than moving to the TracSuposePlugin. Is there functionality in this plugin that doesn't exist in the TracSuposePlugin? I'm open to hearing all opinions and suggestions.

I'll leave these tickets open for about a week, but in all likelihood will close all of them and deprecate the plugin.

Thanks for your time,

  • Ryan

comment:6 Changed 13 years ago by ejucovy

I was able to get the Repo Search plugin working across multiple Git repositories with the following patch:

  • tracreposearch/search.py

     
    6464    def get_search_results(self, req, query, filters):
    6565        if 'repo' not in filters:
    6666            return
    67         repo = self.env.get_repository(authname=req.authname)
     67        from trac.versioncontrol import RepositoryManager
     68        repos = RepositoryManager(self.env).get_all_repositories()
     69        results = []
     70        for repo in repos:
     71            results.extend( list(self.get_search_results_for_repo(req, query, filters, repo)) )
     72        return results
     73
     74    def get_search_results_for_repo(self, req, query, filters, reponame):
     75        repo = self.env.get_repository(reponame=reponame, authname=req.authname)
    6876        if not isinstance(query, list):
    6977            query = query.split()
    7078        query = [q.lower() for q in query]
     
    128136                    if found:
    129137                        break
    130138
    131                 yield (self.env.href.browser(node.path) + (found and '#L%i' % found or ''),
    132                        node.path, change.date, change.author,
     139                yield (self.env.href.browser(repo.reponame, node.path) + (found and '#L%i' % found or ''),
     140                       "%s (in %s)" % (node.path, repo.reponame), change.date, change.author,
    133141                       shorten_result(content, query))

I don't recommend committing it as-is, because I haven't yet tested it with a single-repo setup or with backends other than Git. I also haven't tested the Indexer support yet. But, I wanted to post my progress so far in case anyone else finds this useful.

comment:7 Changed 13 years ago by ejucovy

Here's an updated patch that also makes the Indexer work with multiple repos:

  • tracreposearch/indexer.py

     
    3939        key = key.encode('utf-8')
    4040        if key in self._cache:
    4141            return self._cache[key]
    42         return self._cache.setdefault(key, set(self.dbm[key].decode('utf-8').split(pathsep)))
     42        try:
     43            return self._cache.setdefault(key, set(self.dbm[key].decode('utf-8').split(pathsep)))
     44        except:
     45            return []
    4346
    4447    def __setitem__(self, key, value):
    4548        key = key.encode('utf-8')
     
    102105class Indexer:
    103106    _strip = re.compile(r'\S+',re.U)
    104107
    105     def __init__(self, env):
     108    def __init__(self, env, reponame):
    106109        self.env = env
    107         self.repo = self.env.get_repository()
    108110
     111        from trac.versioncontrol import RepositoryManager
     112        self.repo = self.env.get_repository(reponame=reponame)
     113
    109114        if not self.env.config.get('repo-search', 'index',
    110115                                   os.getenv('PYTHON_EGG_CACHE', None)):
    111116            raise TracError("Repository search plugin indexer is not " \
     
    115120
    116121        # TODO Should this use the repo location as well?
    117122        env_id = '%08x' % abs(hash(self.env.path))
    118         self.index_dir = self.env.config.get('repo-search', 'index',
    119                          os.path.join(os.getenv('PYTHON_EGG_CACHE', ''),
    120                                       env_id + '.reposearch.idx'))
     123        self.index_dir = os.path.join(
     124            self.env.config.get('repo-search', 'index',
     125                                os.path.join(os.getenv('PYTHON_EGG_CACHE', ''),
     126                                             env_id + '.reposearch.idx')),
     127            self.repo.reponame)
     128           
    121129        self.env.log.debug('Repository search index: %s' % self.index_dir)
    122130        self.minimum_word_length = int(self.env.config.get('repo-search',
    123131                                       'minimum-word-length', 3))
     
    165173    def need_reindex(self):
    166174        return not hasattr(self, 'meta') \
    167175            or self.repo.youngest_rev != \
    168                int(self.meta.get('last-repo-rev', -1)) \
     176               self.meta.get('last-repo-rev', -1) \
    169177            or self.env.config.get('repo-search', 'include', '') \
    170178               != self.meta.get('index-include', '') \
    171179            or self.env.config.get('repo-search', 'exclude', '') \
     
    247255                    self._invalidate_file(node.path)
    248256                    self._reindex_node(node)
    249257            new_files.add(node.path)
    250        
     258            self.sync()
     259
    251260        # All files that don't match the new filter criteria must be purged
    252261        # from the index
    253262        invalidated_files = set(self.files.keys())
  • tracreposearch/search.py

     
    6464    def get_search_results(self, req, query, filters):
    6565        if 'repo' not in filters:
    6666            return
    67         repo = self.env.get_repository(authname=req.authname)
     67        from trac.versioncontrol import RepositoryManager
     68        repos = RepositoryManager(self.env).get_all_repositories()
     69        results = []
     70        for repo in repos:
     71            results.extend( list(self.get_search_results_for_repo(req, query, filters, repo)) )
     72        return results
     73
     74    def get_search_results_for_repo(self, req, query, filters, reponame):
     75        repo = self.env.get_repository(reponame=reponame, authname=req.authname)
    6876        if not isinstance(query, list):
    6977            query = query.split()
    7078        query = [q.lower() for q in query]
     
    7684        # Use indexer if possible, otherwise fall back on brute force search.
    7785        try:
    7886            from tracreposearch.indexer import Indexer
    79             self.indexer = Indexer(self.env)
     87            self.indexer = Indexer(self.env, reponame)
    8088            self.indexer.reindex()
    8189            walker = lambda repo, query: [repo.get_node(filename) for filename
    8290                                          in self.indexer.find_words(query)]
     
    128136                    if found:
    129137                        break
    130138
    131                 yield (self.env.href.browser(node.path) + (found and '#L%i' % found or ''),
    132                        node.path, change.date, change.author,
     139                yield (self.env.href.browser(repo.reponame, node.path) + (found and '#L%i' % found or ''),
     140                       "%s (in %s)" % (node.path, repo.reponame), change.date, change.author,
    133141                       shorten_result(content, query))

Definitely still don't recommend committing as-is. I'll see if I can find the time to test it against other repo backends, and against a single-repo setup.

With this patch, each distinct repository gets its own index in a separate subdirectory of the index path, and the search iterates over all the indexes.

The initial indexing can become very, very slow and memory-intensive here. I added an extra self.sync() call after every node is walked, in an attempt to reduce the memory usage. But it's still pretty intense and really slow. I'll see if I can come up with any improvements in this area..

comment:8 Changed 13 years ago by Ryan J Ollos

ejucovy, since you've forked this plugin, should we close this ticket now? Since you are working on improving the plugin, I'm inclined to contribute my future efforts towards improving your fork rather than continue to work on this plugin.

comment:9 in reply to:  8 ; Changed 13 years ago by ejucovy

Resolution: wontfix
Status: newclosed

Replying to rjollos:

ejucovy, since you've forked this plugin, should we close this ticket now?

Yes, I think that makes sense. As you can probably tell, the MultiRepoSearchPlugin grew out from my patch on this ticket -- it ended up feeling like too big of a design change to implement incrementally with confidence.

So I think the right resolution to this ticket is "if you are using Trac 0.12+ with non-SVN repos and/or multiple repos, try using MultiRepoSearchPlugin."

Since you are working on improving the plugin, I'm inclined to contribute my future efforts towards improving your fork rather than continue to work on this plugin.

That would be great!

comment:10 in reply to:  9 Changed 13 years ago by Ryan J Ollos

Replying to ejucovy:

So I think the right resolution to this ticket is "if you are using Trac 0.12+ with non-SVN repos and/or multiple repos, try using MultiRepoSearchPlugin."

Great. I added that to the wiki page.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Ryan J Ollos.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.