Opened 10 years ago

Closed 10 years ago

Large p4 depots kill virtual memory (patch included)

Reported by: Owned by: r.blum@… lewisbaker normal PerforcePlugin normal needinfo 0.10

If you do an initial sync to a large depot (50,000 changelists), the perforce plugin will use up VM like crazy. (I stopped once I reached about 1.5 GB)

Small modification to the sync procedure solves that problem. (Basically, get changelists in chunks of 1000). Here's my local mod

   # Override sync to precache data to make it run faster
def sync(self):
youngest_stored = self.repos.get_youngest_rev_in_cache(self.db)
if youngest_stored is None:
youngest_stored = '0'

while youngest_stored != str(self.repos.youngest_rev):
# Need to cache all information for changes since the last
# sync operation.

youngest_to_get = self.repos.youngest_rev
if youngest_to_get > int(youngest_stored) + 1000:
youngest_to_get = int(youngest_stored) + 1000

# Obtain a list of changes since the last cache sync
from p4trac.repos import _P4ChangesOutputConsumer
output = _P4ChangesOutputConsumer(self.repos._repos)
self.repos._connection.run('changes', '-l', '-s', 'submitted',
'@>%s,%d' % ( youngest_stored, youngest_to_get ),
output=output)

if output.errors:
from p4trac.repos import PerforceError
raise PerforceError(output.errors)

changes = output.changes
changes.reverse()

# Perform the precaching of the file history for files in these
# changes.
self.repos._repos.precacheFileHistoryForChanges(changes)

youngest_stored=str(youngest_to_get)

# Call on to the default implementation now that we've cached
# enough information to make it run a bit faster.
CachedRepository.sync(self)


comment:1 Changed 10 years ago by lewisbaker

• Resolution set to duplicate
• Status changed from new to closed

Duplicate of #630.

comment:2 Changed 10 years ago by lewisbaker

Do you know if the excessive virtual memory usage was occuring during the initial call to 'p4 changes' or in the call to precacheFileHistoryForChanges()?

The precacheFileHistoryForChanges() basically retrieves all information about the changes (with a combination of 'p4 describe' and 'p4 filelog' commands) and caches it in an internal datastructure ready for the call to CachedRepository.sync().

I'm not sure that batching up calls to precacheFileHistoryForChanges() is going to fix the problem entirely as the same data will all be held in memory by the time the outer loop exits anyway. However, I am curious as to how/why your patch has alleviated your memory usage problems. Any more info you can give would be a great help.

comment:3 Changed 10 years ago by athomas

• Description modified (diff)

Fixed formatting.

comment:4 Changed 10 years ago by r.blum@…

It is both in p4 changes and the precacheFileHistoryChanges. The point is that I've got a repository with about 50,000 changes :)

Breaking it into chunks of 1,000 made the inner loop progress quite nicely, and allowed me to reach the CachedRepository.sync(self) call at all. In the original version, the process failed after having used up 2GB of memory.

It's still slurping about a 750MB of memory until it's done, and spends lots of time GC'ing. (I guess - what else would happen at shutdown? I'm using trac-admin resync, and just 'quit' takes about 5 minutes after the resync)

Off the cuff, I'd suggest you do only get change lists and change list descriptions on demand, and cache them as they come in. Of course I've got no idea if trac would even let you do that ;)

comment:5 Changed 10 years ago by lewisbaker

If not then ticket #630 should be reopened and further discussion should occur there.

comment:6 Changed 10 years ago by r.blum@…

I'm reluctant to call for a reopening - I'm not sure if [1197] fixes the issue

It's (after a bit of grinding) saying "Command failed: 'depotFile'" (filed as #665)

It seems faster and does not churn through memory, so it looks OK in terms of VM, though. (Well, I'd like instantaneous better than 25 mins, but at least it doesn't crash ;)