Opened 19 years ago
Last modified 11 months ago
#362 new defect
repository search seems to be impacted by presence of branches
Reported by: | Patrick Martin | Owned by: | Alex Smith |
---|---|---|---|
Priority: | normal | Component: | RepoSearchPlugin |
Severity: | major | Keywords: | unintuitive performance drop |
Cc: | jorge.vargas+trac@… | Trac Release: | 0.10 |
Description
repo search seems to be slowed (re-index) is much slower when branches have been made from the trunk.
I've seen this twice in two repositories of small size, on lowish spec. machines.
Looking at open files, the indexing appears to loiter at a revision file (using fsfs for repository) that is a large commit to the URL that was branched.
The indexing on the repository seemed much faster prior to this change.
I can roll back one of repositories and test this more scientifically, if that would be helpful.
Or try something else, like attaching a profiler of some kind?
Regards,
Patrick
Attachments (0)
Change History (11)
comment:1 Changed 18 years ago by
comment:2 Changed 18 years ago by
Status: | new → assigned |
---|
Thanks for the info, I hadn't really thought about it in detail but it makes sense. I'll have a look at the API and see if there is a more efficient way of handling copies.
comment:3 Changed 18 years ago by
comment:4 Changed 18 years ago by
I was having some pretty severe issues with the repository search when I added some tagged releases. I am trying out the pyndexter branch, I've got hype set up and my logs show the plugin is happily indexing my repository, however I get a 500 error when the searches go through. I've enabled debug logging in both Apache and Trac but can't get more information than this:
[Mon Mar 05 17:56:12 2007] [error] [client 127.0.0.1] malformed header from script. Bad header=test: trac.cgi, referer: http://localhost/trac.cgi/seas/search
Happy to debug further, if you can advise how to proceed.
comment:5 Changed 18 years ago by
Hi: I'm more and more convinced that to get a user-proof solution, any indexer that plugged in to the svn API would need to have some cheap way of identifying when a revision of a file is a "copy with history" and simply duplicate the search entries, or in a re-structured database, simply add links to the index entries from that file and revision.
There are several issues that exist with the brute: one is currently the ability of any user who branches and tags enough to blow the available memory or the cgi execution time-out by supplying an (apparently) huge amount of files to index.
The other is that the way Subversion uses deltas, even if the memory and time budget were not exhausted, you can see that there are file accesses to all the previous revisions contributing to a given revision, so it's very easy to generate a huge amount of file reads.
The only generalised solution I can see is to make the costs to the user and the indexer scale the same way for "copy with history" - this is going to be a challenge...
comment:6 Changed 18 years ago by
Cc: | jorge.vargas+trac@… added; anonymous removed |
---|
This plugin seems interesting but with this issue it;s very unlikely people will use it.
comment:7 Changed 15 years ago by
Owner: | changed from Alec Thomas to Ryan J Ollos |
---|---|
Status: | assigned → new |
Reassigning ticket after changing maintainer of plugin.
comment:8 Changed 13 years ago by
Hello,
I took over maintainership of this plugin from athomas some time ago. There is a significant amount of work to do on this plugin, and I don't foresee having the time to do it all.
helend has written the TracSuposePlugin, which seems like a much better solution. Rather than writing the repository search functionality from scratch, a Trac interface to an existing repository search tool has been created. Rather than throwing more effort at this plugin, I'd prefer to help helend with enhancements to the TracSuposePlugin, or spend my time on other Trac plugin projects altogether.
I'd like to get some feedback and hear if anyone knows of a compelling reason to continue this project rather than moving to the TracSuposePlugin. Is there functionality in this plugin that doesn't exist in the TracSuposePlugin? I'm open to hearing all opinions and suggestions.
I'll leave these tickets open for about a week, but in all likelihood will close all of them and deprecate the plugin.
Thanks for your time,
- Ryan
comment:9 Changed 10 years ago by
Owner: | changed from Ryan J Ollos to anonymous |
---|
comment:10 Changed 11 months ago by
Owner: | changed from anonymous to Alex Smith |
---|---|
Status: | new → assigned |
comment:11 Changed 11 months ago by
Status: | assigned → new |
---|
Looking into it: the algorithm is prefectly correct, as far as it goes. When a branch or tag is created, then the new branch or tag is indexed.
A possible nice feature is if the Node getentries call accepted a filter so the return list could be shorted, but that's in the Subversion supplied python bindings...
The one remaining optimisation is to make re-indexing new tags/branches as cheap as creating them in subversion.
Unless creating from working copies, a branch will be identical to the previous location, so all that needs to be done is to duplicate the index references for the original location for the new location.
This assumes a lot about the storage of the index, and what APIs are available from the svn wrappers, but it would be great if tagging didn't end up penalising the repository search.
I noticed this through having a dozen or so branches/tags of several thousand files, and the indexing process was using a lot of memory when I tried the first index. I was actually worried about the process running out of address space!
If this enhancement were possible, this would make the algorithm about as optimal as I can see, from the perspective of repository queries and file indexes.