Modify

Opened 2 years ago

Closed 2 years ago

#10619 closed defect (fixed)

initial indexing of a repository fails (postgres)

Reported by: anonymous Owned by: ejucovy
Priority: highest Component: MultiRepoSearchPlugin
Severity: blocker Keywords:
Cc: Trac Release: 0.12

Description

I just installed MultiRepoSearchPlugin and after following the instructions I got to the following error:

omega# trac-admin /var/trac/tcsnp multireposearch reindex_all
Repo  DOES need reindexing
Fetching content at development/tcsnp/Makefile.PL
...
...
OperationalError: index row size 7280 exceeds btree maximum, 2713
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

Upon further investigation, it seems this is a pretty common error in PostgreSQL, when trying to index a large TEXT column with a btree-type index:

http://archives.postgresql.org/pgsql-general/2011-03/msg00436.php
https://code.djangoproject.com/ticket/14904

Is this related to the way that MultiRepoSearchPlugin stores data in the schema? (e.g. can it be changed to make it work in Postgres?)

Or maybe you can use full text indexing (instead of btree-indexing) to achieve the same?

Attachments (0)

Change History (4)

comment:1 Changed 2 years ago by ejucovy

  • Status changed from new to assigned

Thanks for the report and very helpful research. The code creates an index on the repository_node.contents column since that column is used in the searches, but I admit I didn't give it a lot of thought up front. It actually seems that the default btree index for this text column is completely [worse than] useless for the sorts of text searches the plugin does, since they're not just searching for patterns at the beginning of the string.

So, you're completely right that full text indexing would be preferable to btree indexing. I'll need to do more research to figure out how to actually implement that. But in the meantime, I think that removing the indexing altogether won't hurt the performance of the plugin at all, and will bring added benefits like less space consumed, and the ability for you to use it at all. :-)

I've pushed a change to the code to remove the index. I haven't tagged a new release yet, or defined a migration script, because I'd like to make sure this approach helps before releasing it. Could you try re-installing the plugin with this change? Actually the easiest way to test it would be either

  1. using a postgres client to your trac database, drop table repository_node and drop table repository_version, then install the plugin from the git repository's master branch, and re-run the setup;
  2. or don't even bother with that, and instead just execute a postgres drop index command -- I'm not sure what the name of the index is (probably repository_node_contents_idx but I'm not 100% sure) so you might need to use a psql \d repository_node or something to find it.

Either way, once you've removed the offending index, could you try re-running the reindex_all command and see if the failure goes away?

comment:2 Changed 2 years ago by ejucovy

Typo in the above comment -- the name of the index that you should drop is probably repository_node_contents_idx (not repository_node_text_idx)

comment:3 Changed 2 years ago by ejucovy

I'll actually be migrating my own Trac installation at work from sqlite to postgres over the next week or so, so I'll have a chance to reproduce this bug and test the fix myself if you prefer to wait.

comment:4 Changed 2 years ago by ejucovy

  • Resolution set to fixed
  • Status changed from assigned to closed

I've migrated my own Trac instances to postgres after making the change that causes the index to no longer be created for new installations. I've confirmed that this does work.

So, to fix an existing installation that uses postgres and is getting this error, execute the following postgres statement in your trac db:

drop index repository_node_contents_idx;

(I'm not 100% sure that this is always the name of the index, so you might need to use a psql \d repository_node to find it if the above doesn't work.)

Add Comment

Modify Ticket

Action
as closed The owner will remain ejucovy.
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.