Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#275 closed defect (fixed)

Repo Search indexer hangs when using default PYTHON_EGG_CACHE dir

Reported by: anonymous Owned by: Alec Thomas
Priority: normal Component: RepoSearchPlugin
Severity: normal Keywords:
Cc: jeoffwilks@… Trac Release: 0.9


I downloaded the ZIP, extracted, ran python bdist_egg and then copied the egg file to my trac instance's plugins directory. The [ ] Search Repository option now displays, but when I use it, the web request just hangs.

Is this just the lag of the initial indexing, or has something gone horribly wrong? There is no indication of any status or error in trac.log.

I'm using Trac 0.9.3 on Debian Linux, and I set up trac.ini as described in the configuration file (along with trying several other variations).

Attachments (0)

Change History (10)

comment:1 Changed 11 years ago by anonymous

Cc: jeoffwilks@… added; anonymous removed

comment:2 Changed 11 years ago by Alec Thomas

If you have debug level logging enabled you should see a new log message for each file that is indexed.

Here is a way to force an index from the command line:

from trac.env import Environment
env = Environment(sys.argv[1])
from tracreposearch.indexer import Indexer
indexer = Indexer(env)
if indexer.need_reindex():
	print "Reindexing"
	print "Reindex not required"

comment:3 Changed 11 years ago by jeoffwilks@…

So if the repo is very large and it's presumably indexing---then suppose the web request gets canceled. Will that cancel the indexing or does it continue to run?

Anyway, I will try the command line script tomorrow. Thanks for posting it.

comment:4 Changed 11 years ago by jeoffwilks@…

Summary: Repo Search just hangs...Repo Search indexer hangs when using default PYTHON_EGG_CACHE dir
Trac Release: 0.9

The RepoSearchPlugin wiki page says,

"The indexer is enabled by default and will use the PYTHON_EGG_CACHE dir to store its data."

I initially did not set the "index" option under [repo-search] because I have set PYTHON_EGG_CACHE=/var/cache/apache2. The indexing appeared to be working (uses a lot of CPU for a long time), but nothing was written anywhere under the PYTHON_EGG_CACHE directory. That was the case when I tried both within the web server (by conducting a search), and as root using the command line script provided above.

I've now set the index option, and it seems to be writing files to the directory I specified. Is there some special caveat to making the indexer work with the default PYTHON_EGG_CACHE directory?

comment:5 Changed 11 years ago by anonymous

OK, after reading the code I now see that it places the index in a hidden directory .idx within the PYTHON_EGG_CACHE directory.

92  self.index_dir = self.env.config.get('repo-search', 'index',
93                   os.path.join(os.getenv('PYTHON_EGG_CACHE', ''), '.idx'))

That definitely had me tripped up. I have 2 recommendations:

  1. When the indexer needs to recreate the entire index, have it issue a single log statement at a higher logging level (perhaps even ERROR) that includes the directory where the index will be written. For example:
    ERROR: No index found at '/var/cache/apache2/.idx' - Creating index.
  2. I suggest using a default path that is unique for each Trac instance (because in some cases a server will have several Trac instances running). e.g.

...where $TRAC_ENV_NAME would be the name of the specific Trac instance being indexed.

comment:6 Changed 11 years ago by Alec Thomas

Most people don't have logging enabled at all, so putting it a higher level probably isn't going to help at all with finding it after the fact. I'll update the code though, as it can't hurt, and I'll also document the behaviour more thoroughly in the wiki page.

Your second point is well founded, I'll fix that as well. Not sure how to do this in a backwards compatible way though...hmmm.

comment:7 Changed 11 years ago by jeoffwilks@…

Thanks for your work on this plugin. I did finally get everything indexed, and it was very cool to see the search results come up almost instantaneously.

comment:8 Changed 11 years ago by Alec Thomas

My pleasure, I use it on TracHacks, so it scratched an itch :)

You might want to consider running the indexer under cron every night, or hour. That might alleviate the waiting somewhat.

comment:9 Changed 11 years ago by Alec Thomas

Resolution: fixed
Status: newclosed

Fixed in r671. The index for each environment is given a unique name based on the path to the environment.

comment:10 Changed 11 years ago by Alec Thomas

(also, massive speedups, you should upgrade!)

Modify Ticket

as closed The owner will remain Alec Thomas.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.