Modify

Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#275 closed defect (fixed)

Repo Search indexer hangs when using default PYTHON_EGG_CACHE dir

Reported by: anonymous Owned by: athomas
Priority: normal Component: RepoSearchPlugin
Severity: normal Keywords:
Cc: jeoffwilks@… Trac Release: 0.9

Description

I downloaded the ZIP, extracted, ran python setup.py bdist_egg and then copied the egg file to my trac instance's plugins directory. The [ ] Search Repository option now displays, but when I use it, the web request just hangs.

Is this just the lag of the initial indexing, or has something gone horribly wrong? There is no indication of any status or error in trac.log.

I'm using Trac 0.9.3 on Debian Linux, and I set up trac.ini as described in the configuration file (along with trying several other variations).

Attachments (0)

Change History (10)

comment:1 Changed 9 years ago by anonymous

  • Cc jeoffwilks@… added; anonymous removed

comment:2 Changed 9 years ago by athomas

If you have debug level logging enabled you should see a new log message for each file that is indexed.

Here is a way to force an index from the command line:

from trac.env import Environment
env = Environment(sys.argv[1])
from tracreposearch.indexer import Indexer
indexer = Indexer(env)
if indexer.need_reindex():
	print "Reindexing"
	indexer.reindex()
else:
	print "Reindex not required"

comment:3 Changed 9 years ago by jeoffwilks@…

So if the repo is very large and it's presumably indexing---then suppose the web request gets canceled. Will that cancel the indexing or does it continue to run?

Anyway, I will try the command line script tomorrow. Thanks for posting it.

comment:4 Changed 9 years ago by jeoffwilks@…

  • Summary changed from Repo Search just hangs... to Repo Search indexer hangs when using default PYTHON_EGG_CACHE dir
  • Trac Release set to 0.9

The RepoSearchPlugin wiki page says,

"The indexer is enabled by default and will use the PYTHON_EGG_CACHE dir to store its data."

I initially did not set the "index" option under [repo-search] because I have set PYTHON_EGG_CACHE=/var/cache/apache2. The indexing appeared to be working (uses a lot of CPU for a long time), but nothing was written anywhere under the PYTHON_EGG_CACHE directory. That was the case when I tried both within the web server (by conducting a search), and as root using the command line script provided above.

I've now set the index option, and it seems to be writing files to the directory I specified. Is there some special caveat to making the indexer work with the default PYTHON_EGG_CACHE directory?

comment:5 Changed 9 years ago by anonymous

OK, after reading the code I now see that it places the index in a hidden directory .idx within the PYTHON_EGG_CACHE directory.

http://trac-hacks.org/browser/reposearchplugin/0.9/tracreposearch/indexer.py#L93

92  self.index_dir = self.env.config.get('repo-search', 'index',
93                   os.path.join(os.getenv('PYTHON_EGG_CACHE', ''), '.idx'))

That definitely had me tripped up. I have 2 recommendations:

  1. When the indexer needs to recreate the entire index, have it issue a single log statement at a higher logging level (perhaps even ERROR) that includes the directory where the index will be written. For example:
    ERROR: No index found at '/var/cache/apache2/.idx' - Creating index.
    
  2. I suggest using a default path that is unique for each Trac instance (because in some cases a server will have several Trac instances running). e.g.
    $PYTHON_EGG_CACHE/trac_index_$TRAC_ENV_NAME
    

...where $TRAC_ENV_NAME would be the name of the specific Trac instance being indexed.

comment:6 Changed 9 years ago by athomas

Most people don't have logging enabled at all, so putting it a higher level probably isn't going to help at all with finding it after the fact. I'll update the code though, as it can't hurt, and I'll also document the behaviour more thoroughly in the wiki page.

Your second point is well founded, I'll fix that as well. Not sure how to do this in a backwards compatible way though...hmmm.

comment:7 Changed 9 years ago by jeoffwilks@…

Thanks for your work on this plugin. I did finally get everything indexed, and it was very cool to see the search results come up almost instantaneously.

comment:8 Changed 9 years ago by athomas

My pleasure, I use it on TracHacks, so it scratched an itch :)

You might want to consider running the indexer under cron every night, or hour. That might alleviate the waiting somewhat.

comment:9 Changed 9 years ago by athomas

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in r671. The index for each environment is given a unique name based on the path to the environment.

comment:10 Changed 9 years ago by athomas

(also, massive speedups, you should upgrade!)

Add Comment

Modify Ticket

Action
as closed The owner will remain athomas.
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.