Opened 12 years ago

Closed 12 years ago

# Repo Search indexer hangs when using default PYTHON_EGG_CACHE dir

Reported by: Owned by: anonymous Alec Thomas normal RepoSearchPlugin normal jeoffwilks@… 0.9

### Description

I downloaded the ZIP, extracted, ran python setup.py bdist_egg and then copied the egg file to my trac instance's plugins directory. The [ ] Search Repository option now displays, but when I use it, the web request just hangs.

Is this just the lag of the initial indexing, or has something gone horribly wrong? There is no indication of any status or error in trac.log.

I'm using Trac 0.9.3 on Debian Linux, and I set up trac.ini as described in the configuration file (along with trying several other variations).

### comment:2 Changed 12 years ago by Alec Thomas

If you have debug level logging enabled you should see a new log message for each file that is indexed.

Here is a way to force an index from the command line:

from trac.env import Environment
env = Environment(sys.argv[1])
from tracreposearch.indexer import Indexer
indexer = Indexer(env)
if indexer.need_reindex():
print "Reindexing"
indexer.reindex()
else:
print "Reindex not required"


### comment:3 Changed 12 years ago by jeoffwilks@…

So if the repo is very large and it's presumably indexing---then suppose the web request gets canceled. Will that cancel the indexing or does it continue to run?

Anyway, I will try the command line script tomorrow. Thanks for posting it.

### comment:4 Changed 12 years ago by jeoffwilks@…

Summary: Repo Search just hangs... → Repo Search indexer hangs when using default PYTHON_EGG_CACHE dir → 0.9

The RepoSearchPlugin wiki page says,

"The indexer is enabled by default and will use the PYTHON_EGG_CACHE dir to store its data."

I initially did not set the "index" option under [repo-search] because I have set PYTHON_EGG_CACHE=/var/cache/apache2. The indexing appeared to be working (uses a lot of CPU for a long time), but nothing was written anywhere under the PYTHON_EGG_CACHE directory. That was the case when I tried both within the web server (by conducting a search), and as root using the command line script provided above.

I've now set the index option, and it seems to be writing files to the directory I specified. Is there some special caveat to making the indexer work with the default PYTHON_EGG_CACHE directory?

### comment:5 Changed 12 years ago by anonymous

OK, after reading the code I now see that it places the index in a hidden directory .idx within the PYTHON_EGG_CACHE directory.

92  self.index_dir = self.env.config.get('repo-search', 'index',
93                   os.path.join(os.getenv('PYTHON_EGG_CACHE', ''), '.idx'))


That definitely had me tripped up. I have 2 recommendations:

1. When the indexer needs to recreate the entire index, have it issue a single log statement at a higher logging level (perhaps even ERROR) that includes the directory where the index will be written. For example:
ERROR: No index found at '/var/cache/apache2/.idx' - Creating index.

2. I suggest using a default path that is unique for each Trac instance (because in some cases a server will have several Trac instances running). e.g.
$PYTHON_EGG_CACHE/trac_index_$TRAC_ENV_NAME


...where \$TRAC_ENV_NAME would be the name of the specific Trac instance being indexed.

### comment:6 Changed 12 years ago by Alec Thomas

Most people don't have logging enabled at all, so putting it a higher level probably isn't going to help at all with finding it after the fact. I'll update the code though, as it can't hurt, and I'll also document the behaviour more thoroughly in the wiki page.

Your second point is well founded, I'll fix that as well. Not sure how to do this in a backwards compatible way though...hmmm.

### comment:7 Changed 12 years ago by jeoffwilks@…

Thanks for your work on this plugin. I did finally get everything indexed, and it was very cool to see the search results come up almost instantaneously.

### comment:8 Changed 12 years ago by Alec Thomas

My pleasure, I use it on TracHacks, so it scratched an itch :)

You might want to consider running the indexer under cron every night, or hour. That might alleviate the waiting somewhat.

### comment:9 Changed 12 years ago by Alec Thomas

Resolution: → fixed new → closed

Fixed in r671. The index for each environment is given a unique name based on the path to the environment.

### comment:10 Changed 12 years ago by Alec Thomas

(also, massive speedups, you should upgrade!)

### Modify Ticket

Change Properties