Ticket #746 (assigned task)

Opened 3 years ago

Last modified 6 days ago

Improve performance of GitPlugin

Reported by: hvr Assigned to: hvr (accepted)
Priority: high Component: GitPlugin
Severity: critical Keywords:
Cc: Trac Release: 0.10

Description

The current implementation relies heavely on git-core executables, therefore each information retrieval that needs to access the git repository costs a fork(), making it quite an expensive operation. Therefore performance is quite low right now.

A better performance should be possible to accomplish by:

  • caching meta data
  • optimize GitNode?.get_content_length()
  • access the git repository more efficiently

Attachments

Change History

09/29/06 12:47:23 changed by hvr

  • priority changed from normal to high.
  • status changed from new to assigned.
  • severity changed from normal to critical.

09/29/06 14:26:50 changed by hvr

(In [1321]) performance improvements

addresses #746

09/29/06 15:03:39 changed by hvr

(In [1322]) yet another minor performance improvement for big repositories;

addresses #746

02/06/08 15:39:50 changed by hvr

(In [3185]) - changed next revision/previous revision to point to next/previous changeset in a flattened history as provided by git-rev-list --all

  • implemented in-memory commit tree cache in order to speed up typical Trac repository access patterns (addresses #746)
  • allow to wrap GitRepository in a CachedRepository, and thus store meta-data in Trac's sql db (addresses #746)
  • implemented new [git] options to control caching:
    [git]
    
    cached_repository = true
    
    persistent_cache = true
    
  • various other fixes and cleanups

02/08/08 22:02:17 changed by hvr

(In [3199]) GitPlugin: call popen3 with sequence instead of string as command, in order to avoid shell-overhead (addresses #746)

02/15/08 10:55:03 changed by hvr

additional notes:

  • try to create an additional libgit-thin-based Storage in PyGIT.py to evaluate whether the exec+fork+parse overhead is still signifikant
  • most external git calls can be avoided now thanks to extensive caching of meta-data; one of the remaining speed killers is listing directories in the source browser (cat-file -s calls can be optimized, by using GIT 1.5.3+ -l option to ls-tree; but having to call rev-list for each folder element still remains an issue... maybe it'll be possible to get an enhancement to ls-tree merged upstream to provide that information as well...)

04/07/08 14:42:37 changed by osmaker+gitplugin@gmail.com

I thought I should mention this since hvr is considering a similar thing with libgit-thin.

I've been working on a small C library to help with performance on this and another project. (I chose not to use libgit-thin, as it has memory leaks which can't be fixed without some significant changes to Git itself.)

The goals for it are simple:

  • Read Git objects (no write support).
  • Do so with high performance, providing various methods which can be tuned to the specific application/command.

I've created a Python wrapper around the library and have implemented some basic Git commands with it in Python (ls-tree, rev-list, cat-file, etc). Already, most calls are magnitudes faster than using the git binaries via popen3.

I plan to have a public, working version available the end of April. (And hopefully some simple patches for low-performance areas of GitPlugin.)

(follow-up: ↓ 9 ) 01/01/09 15:45:59 changed by jarin.franek@post.cz

I tried the trac with the git plugin on the Linux git repo. Without cache: e.g. cached_repository=false, persistent_cache=false, everything was well. Turning the cache on, however, made me a headache: could not get timeline in 10 minutes (I gave up then) of 100% CPU load. Browsing sources shown revisions 3 or 4 years old only (still 100% CPU).

I guess there are some bugs in the caching code. Should I consider it for a new bug report or is it fine to leave it on this ticket (as it is about performance)?

To replicate the issue: 1. trac 0.11.2.1, git-plugin 0.11.0.1 svn5076, git 1.6.0.6, python 2.5.2 installed 2. git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6 3. mkdir trac 4. trac-admin ./trac initenv

fill in data appropriatelly

5. in the trac environment config file:

  1. enable git plugin
  2. enable cache
  3. set git_bin to a full path to git binary (it cease to work with only 'git' there, at least for me)

6. launch tracd --port <your.choice> <path_to_the_trac_env> 7. try to access the project timeline page via e.g. Firefox, 8. 100% CPU, no page shown... and 7b. Try to access code browser at http://localhost:<your.port>/trac/browser 8b. Very old revisions showed, 100% CPU

Note that with a small git repo I had no difficulties with using cache.

(in reply to: ↑ 8 ) 03/15/10 22:41:09 changed by georgyo

I was having serious performance issues, and caching was the problem.

One a repo grows past 500 commits, it starts to make trac very slow if caching is enabled.


Add/Change #746 (Improve performance of GitPlugin)




Change Properties
Action