Modify

Opened 9 months ago

Last modified 9 months ago

#11501 new defect

Error when migrating from sqlite to postgresql

Reported by: me@… Owned by: ejucovy
Priority: normal Component: MultiRepoSearchPlugin
Severity: normal Keywords:
Cc: Trac Release:

Description (last modified by jun66j5)

Hi,

I'm new to Trac but after using the MultiRepoSearchPlugin I find that the searches are pretty slow. I'm hoping that switching to a real database will speed up searches (full text indexes, etc).

Attempting to issue a migrate to switch over to postresql gives this error:

Copying tables:
  attachment table... 0 records.
  auth_cookie table... 1 records.
  cache table... 3 records.
  component table... 4 records.
  enum table... 13 records.
  mastertickets table... 0 records.
  milestone table... 0 records.
  node_change table... 9641 records.
  permission table... 30 records.
  report table... 8 records.
  repository table... 8 records.
OperationalError: Could not decode to UTF-8 column 'contents' with text 'OggS'
  repository_node table... 

The same error occurs when attempting to use MySQL as well (which is what I prefer).

-Andrew

Attachments (0)

Change History (6)

comment:1 Changed 9 months ago by jun66j5

  • Component changed from TracMigratePlugin to MultiRepoSearchPlugin
  • Description modified (diff)
  • Owner changed from jun66j5 to ejucovy

That is a MultiRepoSearchPlugin issue. The plugin creates repository_node.contents as TEXT column and stores contents of files in the repositories into the column. The column must be stored utf-8 string cause of TEXT column, not binary. However, the plugin would store the contents without changes to repository_node.contents.

comment:2 Changed 9 months ago by anonymous

What do you recommend to fix this? Would be safe to truncate the repository_node.contents table then just resync the search plugin (or find another since this one looks like it's not excluding binary files) after the migration?

-Andrew

comment:3 Changed 9 months ago by me@…

So I removed the contents column from repository_node and now I'm getting this error:

TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

Any ideas?

-Andrew

comment:4 Changed 9 months ago by me@…

Never mind! Got it figured out, I forgot to change the collation of the MySQL DB to utf8mb4_bin.

So if anyone else has this issue remove the repository_node.contents column and if using MySQL then make sure you change the DB collation to utf8mb4_bin.

-Andrew

comment:5 Changed 9 months ago by ejucovy

Thanks jun66j5 - I should have realized that passing the result of to_unicode directly to a database cursor wouldn't be a great idea. :-P I'll look into the proper way to fix this for MultiRepoSearchPlugin.

Andrew, just want to (belatedly) confirm that your approach is a good workaround - since MultiRepoSearchPlugin just uses its database tables to cache repository contents, you can safely clear them out altogether (both repository_node and repository_version) and then re-run multireposearch reindex_all after your migration to repopulate those tables.

I'll also look into automatically excluding binary files in the next release. (Note to self: trac.mimeview.api.is_binary sure looks promising.)

Lastly, I should warn that searches can still be pretty slow -- I also migrated my Trac site from sqlite (to postgres) for the same reason, and it's certainly an improvement over the sqlite performance, but the plugin itself doesn't currently do anything to help you set up full-text indexes etc. So by default even on MySQL or Postgres your searches will basically be iterating over all rows in repository_node with LIKE statements. If you're comfortable setting up a full-text index directly in your database backend against repository_node.contents, that will likely give much better performance. I haven't had time to look into this (let alone benchmark it) so I've just been living with slow searches on my own Trac installation, but if you do set this up I'd be very eager to hear what you did to set up the indexes and whether it improved performance, if only to add to the "Installation" docs.

comment:6 Changed 9 months ago by me@…

ejucovy when do you plan on releasing a new version to exclude binary files?

When I originally used the plugin it caused the sqlite DB to increase to 13GiB in size! Once the binary issue is figured out I'll re-enable the plugin and create some indexes on the tables but since Trac uses InnoDB tables, the fulltext searches will work completely different:

http://dev.mysql.com/doc/refman/5.6/en/innodb-table-and-index.html#innodb-fulltext-index
http://www.mysqlperformanceblog.com/2013/03/04/innodb-full-text-search-in-mysql-5-6-part-2-the-queries/

There might need to be some rejigging of the SQL queries depending on if the DB is sqlite, MySQL or PostgreSQL.

Also do you have any tips for development of Trac plugins? Should I use easy_install to remove the old version, make changes to a copy then reinstall using easy_install. Or is it possible to modify some Python source of the plugin on the fly? I'm having trouble finding where easy_install puts the plugin files, and the documentation for Trac isn't really the greatest!

-Andrew

Add Comment

Modify Ticket

Action
as new .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.