Opened 9 years ago

Last modified 3 years ago

#746 assigned task

Improve performance of GitPlugin

Reported by: hvr Owned by: stuge
Priority: high Component: GitPlugin
Severity: critical Keywords:
Cc: gseanmcg@…, stuge Trac Release: 0.10


The current implementation relies heavely on git-core executables, therefore each information retrieval that needs to access the git repository costs a fork(), making it quite an expensive operation. Therefore performance is quite low right now.

A better performance should be possible to accomplish by:

  • caching meta data
  • optimize GitNode.get_content_length()
  • access the git repository more efficiently

Attachments (0)

Change History (14)

comment:1 Changed 9 years ago by hvr

  • Priority changed from normal to high
  • Severity changed from normal to critical
  • Status changed from new to assigned

comment:2 Changed 9 years ago by hvr

(In [1321]) performance improvements

addresses #746

comment:3 Changed 9 years ago by hvr

(In [1322]) yet another minor performance improvement for big repositories;

addresses #746

comment:4 Changed 7 years ago by hvr

(In [3185]) - changed next revision/previous revision to point to next/previous changeset in a flattened history as provided by git-rev-list --all

  • implemented in-memory commit tree cache in order to speed up typical Trac repository access patterns (addresses #746)
  • allow to wrap GitRepository in a CachedRepository, and thus store meta-data in Trac's sql db (addresses #746)
  • implemented new [git] options to control caching:
    cached_repository = true
    persistent_cache = true
  • various other fixes and cleanups

comment:5 Changed 7 years ago by hvr

(In [3199]) GitPlugin: call popen3 with sequence instead of string as command, in order to avoid shell-overhead (addresses #746)

comment:6 Changed 7 years ago by hvr

additional notes:

  • try to create an additional libgit-thin-based Storage in to evaluate whether the exec+fork+parse overhead is still signifikant
  • most external git calls can be avoided now thanks to extensive caching of meta-data; one of the remaining speed killers is listing directories in the source browser (cat-file -s calls can be optimized, by using GIT 1.5.3+ -l option to ls-tree; but having to call rev-list for each folder element still remains an issue... maybe it'll be possible to get an enhancement to ls-tree merged upstream to provide that information as well...)

comment:7 Changed 7 years ago by osmaker+gitplugin@…

I thought I should mention this since hvr is considering a similar thing with libgit-thin.

I've been working on a small C library to help with performance on this and another project. (I chose not to use libgit-thin, as it has memory leaks which can't be fixed without some significant changes to Git itself.)

The goals for it are simple:

  • Read Git objects (no write support).
  • Do so with high performance, providing various methods which can be tuned to the specific application/command.

I've created a Python wrapper around the library and have implemented some basic Git commands with it in Python (ls-tree, rev-list, cat-file, etc). Already, most calls are magnitudes faster than using the git binaries via popen3.

I plan to have a public, working version available the end of April. (And hopefully some simple patches for low-performance areas of GitPlugin.)

comment:8 follow-up: Changed 6 years ago by jarin.franek@…

I tried the trac with the git plugin on the Linux git repo. Without cache: e.g. cached_repository=false, persistent_cache=false, everything was well. Turning the cache on, however, made me a headache: could not get timeline in 10 minutes (I gave up then) of 100% CPU load. Browsing sources shown revisions 3 or 4 years old only (still 100% CPU).

I guess there are some bugs in the caching code. Should I consider it for a new bug report or is it fine to leave it on this ticket (as it is about performance)?

To replicate the issue:

  1. trac, git-plugin svn5076, git, python 2.5.2 installed
  2. git clone git:// linux-2.6
  3. mkdir trac
  4. trac-admin ./trac initenv fill in data appropriatelly
  5. in the trac environment config file:
    1. enable git plugin
    2. enable cache
    3. set git_bin to a full path to git binary (it cease to work with only 'git' there, at least for me)
  6. launch tracd --port <your.choice> <path_to_the_trac_env>
  7. try to access the project timeline page via e.g. Firefox,
  8. 100% CPU, no page shown...

and 7b. Try to access code browser at http://localhost:<your.port>/trac/browser 8b. Very old revisions showed, 100% CPU

Note that with a small git repo I had no difficulties with using cache.

comment:9 in reply to: ↑ 8 Changed 5 years ago by georgyo

I was having serious performance issues, and caching was the problem.

One a repo grows past 500 commits, it starts to make trac very slow if caching is enabled.

comment:10 Changed 5 years ago by anonymous

  • Cc gseanmcg@… added; anonymous removed

comment:11 Changed 5 years ago by anonymous

  • Cc peter@… added

comment:12 Changed 3 years ago by anonymous

fyi, was just merged which is supposed to have a positive effect on performance

comment:13 Changed 3 years ago by stuge

  • Cc stuge added; peter@… removed
  • Owner changed from hvr to stuge
  • Status changed from assigned to new

I started work on a libgit2/pygit2 based rewrite of the plugin. The old plugin has meanwhile been included in Trac trunk, and my work to replace it is also done against Trac trunk. I guess that this ticket should be closed, and interested parties should monitor the following pages for progress:

comment:14 Changed 3 years ago by stuge

  • Status changed from new to assigned

Add Comment

Modify Ticket

as assigned The owner will remain stuge.

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.