Opened 8 years ago

# repository search seems to be impacted by presence of branches

Reported by: Owned by: patrickm rjollos normal RepoSearchPlugin major unintuitive performance drop jorge.vargas+trac@… 0.10

### Description

repo search seems to be slowed (re-index) is much slower when branches have been made from the trunk.

I've seen this twice in two repositories of small size, on lowish spec. machines.

Looking at open files, the indexing appears to loiter at a revision file (using fsfs for repository) that is a large commit to the URL that was branched.

The indexing on the repository seemed much faster prior to this change.

I can roll back one of repositories and test this more scientifically, if that would be helpful.

Or try something else, like attaching a profiler of some kind?

Regards,

Patrick

### comment:1 Changed 8 years ago by patrickm

Looking into it: the algorithm is prefectly correct, as far as it goes.
When a branch or tag is created, then the new branch or tag is indexed.

A possible nice feature is if the Node getentries call accepted a filter so the return list could be shorted, but that's in the Subversion supplied python bindings...

The one remaining optimisation is to make re-indexing new tags/branches as cheap as creating them in subversion.

Unless creating from working copies, a branch will be identical to the previous location, so all that needs to be done is to duplicate the index references for the original location for the new location.

This assumes a lot about the storage of the index, and what APIs are available from the svn wrappers, but it would be great if tagging didn't end up penalising the repository search.

I noticed this through having a dozen or so branches/tags of several thousand files, and the indexing process was using a lot of memory when I tried the first index. I was actually worried about the process running out of address space!

If this enhancement were possible, this would make the algorithm about as optimal as I can see, from the perspective of repository queries and file indexes.

### comment:2 Changed 8 years ago by athomas

• Status changed from new to assigned

Thanks for the info, I hadn't really thought about it in detail but it makes sense. I'll have a look at the API and see if there is a more efficient way of handling copies.

### comment:3 Changed 7 years ago by athomas

(In [1881]) Not for the faint of heart, but if anybody wants to test the pyndexter branch
it would be most useful to get some feedback.

You will need the refactoring branch of pyndexter, available from
here.

References #362, #371, #385 and #388.

### comment:4 Changed 7 years ago by tmitchell

I was having some pretty severe issues with the repository search when I added some tagged releases. I am trying out the pyndexter branch, I've got hype set up and my logs show the plugin is happily indexing my repository, however I get a 500 error when the searches go through. I've enabled debug logging in both Apache and Trac but can't get more information than this:

[Mon Mar 05 17:56:12 2007] [error] [client 127.0.0.1] malformed header from script. Bad header=test: trac.cgi, referer: http://localhost/trac.cgi/seas/search


Happy to debug further, if you can advise how to proceed.

### comment:5 Changed 7 years ago by patrickmmartin@…

Hi: I'm more and more convinced that to get a user-proof solution, any indexer that plugged in to the svn API would need to have some cheap way of identifying when a revision of a file is a "copy with history" and simply duplicate the search entries, or in a re-structured database, simply add links to the index entries from that file and revision.

There are several issues that exist with the brute: one is currently the ability of any user who branches and tags enough to blow the available memory or the cgi execution time-out by supplying an (apparently) huge amount of files to index.

The other is that the way Subversion uses deltas, even if the memory and time budget were not exhausted, you can see that there are file accesses to all the previous revisions contributing to a given revision, so it's very easy to generate a huge amount of file reads.

The only generalised solution I can see is to make the costs to the user and the indexer scale the same way for "copy with history" - this is going to be a challenge...

### comment:6 Changed 7 years ago by jorge vargas

This plugin seems interesting but with this issue it;s very unlikely people will use it.

### comment:7 Changed 4 years ago by rjollos

• Owner changed from athomas to rjollos
• Status changed from assigned to new

Reassigning ticket after changing maintainer of plugin.

### comment:8 Changed 3 years ago by rjollos

Hello,

I took over maintainership of this plugin from athomas some time ago. There is a significant amount of work to do on this plugin, and I don't foresee having the time to do it all.

helend has written the TracSuposePlugin, which seems like a much better solution. Rather than writing the repository search functionality from scratch, a Trac interface to an existing repository search tool has been created. Rather than throwing more effort at this plugin, I'd prefer to help helend with enhancements to the TracSuposePlugin, or spend my time on other Trac plugin projects altogether.

I'd like to get some feedback and hear if anyone knows of a compelling reason to continue this project rather than moving to the TracSuposePlugin. Is there functionality in this plugin that doesn't exist in the TracSuposePlugin? I'm open to hearing all opinions and suggestions.

I'll leave these tickets open for about a week, but in all likelihood will close all of them and deprecate the plugin.

• Ryan

### Modify Ticket

Change Properties