#275 closed defect (fixed)
Repo Search indexer hangs when using default PYTHON_EGG_CACHE dir
Reported by: | anonymous | Owned by: | Alec Thomas |
---|---|---|---|
Priority: | normal | Component: | RepoSearchPlugin |
Severity: | normal | Keywords: | |
Cc: | jeoffwilks@… | Trac Release: | 0.9 |
Description
I downloaded the ZIP, extracted, ran python setup.py bdist_egg
and then copied the egg file to my trac instance's plugins directory. The [ ] Search Repository option now displays, but when I use it, the web request just hangs.
Is this just the lag of the initial indexing, or has something gone horribly wrong? There is no indication of any status or error in trac.log.
I'm using Trac 0.9.3 on Debian Linux, and I set up trac.ini as described in the configuration file (along with trying several other variations).
Attachments (0)
Change History (10)
comment:1 Changed 19 years ago by
Cc: | jeoffwilks@… added; anonymous removed |
---|
comment:2 Changed 19 years ago by
comment:3 Changed 19 years ago by
So if the repo is very large and it's presumably indexing---then suppose the web request gets canceled. Will that cancel the indexing or does it continue to run?
Anyway, I will try the command line script tomorrow. Thanks for posting it.
comment:4 Changed 19 years ago by
Summary: | Repo Search just hangs... → Repo Search indexer hangs when using default PYTHON_EGG_CACHE dir |
---|---|
Trac Release: | → 0.9 |
The RepoSearchPlugin wiki page says,
"The indexer is enabled by default and will use the PYTHON_EGG_CACHE dir to store its data."
I initially did not set the "index" option under [repo-search] because I have set PYTHON_EGG_CACHE=/var/cache/apache2
. The indexing appeared to be working (uses a lot of CPU for a long time), but nothing was written anywhere under the PYTHON_EGG_CACHE directory. That was the case when I tried both within the web server (by conducting a search), and as root using the command line script provided above.
I've now set the index
option, and it seems to be writing files to the directory I specified. Is there some special caveat to making the indexer work with the default PYTHON_EGG_CACHE directory?
comment:5 Changed 19 years ago by
OK, after reading the code I now see that it places the index in a hidden directory .idx
within the PYTHON_EGG_CACHE directory.
http://trac-hacks.org/browser/reposearchplugin/0.9/tracreposearch/indexer.py#L93
92 self.index_dir = self.env.config.get('repo-search', 'index', 93 os.path.join(os.getenv('PYTHON_EGG_CACHE', ''), '.idx'))
That definitely had me tripped up. I have 2 recommendations:
- When the indexer needs to recreate the entire index, have it issue a single log statement at a higher logging level (perhaps even ERROR) that includes the directory where the index will be written. For example:
ERROR: No index found at '/var/cache/apache2/.idx' - Creating index.
- I suggest using a default path that is unique for each Trac instance (because in some cases a server will have several Trac instances running). e.g.
$PYTHON_EGG_CACHE/trac_index_$TRAC_ENV_NAME
...where $TRAC_ENV_NAME would be the name of the specific Trac instance being indexed.
comment:6 Changed 19 years ago by
Most people don't have logging enabled at all, so putting it a higher level probably isn't going to help at all with finding it after the fact. I'll update the code though, as it can't hurt, and I'll also document the behaviour more thoroughly in the wiki page.
Your second point is well founded, I'll fix that as well. Not sure how to do this in a backwards compatible way though...hmmm.
comment:7 Changed 19 years ago by
Thanks for your work on this plugin. I did finally get everything indexed, and it was very cool to see the search results come up almost instantaneously.
comment:8 Changed 19 years ago by
My pleasure, I use it on TracHacks, so it scratched an itch :)
You might want to consider running the indexer under cron every night, or hour. That might alleviate the waiting somewhat.
comment:9 Changed 19 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed in r671. The index for each environment is given a unique name based on the path to the environment.
If you have debug level logging enabled you should see a new log message for each file that is indexed.
Here is a way to force an index from the command line: