Modify

Opened 15 years ago

Closed 10 years ago

Last modified 10 years ago

#4503 closed enhancement (fixed)

[PATCH] performance issues

Reported by: Shane Caraveo Owned by: Steffen Hoffmann
Priority: normal Component: TagsPlugin
Severity: major Keywords: performance
Cc: andref.dias@…, Michael Renzmann, Ryan J Ollos, osimons Trac Release: 0.11

Description

The tags plugin is excellent, however, there are some pretty big performance issues.

One is, regardless of specifying a realm (via tags search page or a macro), all tag providers are asked for their tags, and only after that is the realm checked.

The second is checking permissions on each ticket (in the ticket provider). This is fine if you have a few hundred (or thousand) tickets. In a database with a couple hundred thousand tickets, the tags plugin becomes unusable. Relying on a ticket_view and tags_view at a "global" level is sufficient for the majority of cases.

The attached patch addresses both issues above, though I don't regard it as a "final" patch.

Attachments (6)

tagsplugin-perf.patch (2.3 KB) - added by Shane Caraveo 15 years ago.
performance patch
lazy_permission_check.patch (1.5 KB) - added by Michael Renzmann 14 years ago.
Introduce new ini option that allows to skip "per ticket" permission checks
20130519_profiling_tractags_0.7dev-r13134.txt.gz (7.3 KB) - added by Steffen Hoffmann 11 years ago.
details of 4th request to /tags provided by stock TagsPlugin code
20130519_profiling_tractags_0.7dev-r13167.txt.gz (7.4 KB) - added by Steffen Hoffmann 11 years ago.
details of 4th request to /tags provided by TagsPlugin using ticket tags caching
20130519_profiling_tractags_0.7dev-r13168.txt.gz (7.3 KB) - added by Steffen Hoffmann 11 years ago.
details of 4th request to /tags provided by TagsPlugin using permission short-cuts (ticket_fast_permcheck = True) in concert with ticket tags caching
20130520_profiling_tractags_0.7dev-r13172.txt.gz (14.2 KB) - added by Steffen Hoffmann 11 years ago.
details of 4th request to /tags provided by TagsPlugin with/without fine-grained permission checks when caching ticket_fast_permcheck

Download all attachments as: .zip

Change History (34)

Changed 15 years ago by Shane Caraveo

Attachment: tagsplugin-perf.patch added

performance patch

comment:1 Changed 15 years ago by andref.dias@…

Cc: andref.dias@… added; anonymous removed
Type: defectenhancement

These performance issues are of great interest to me too!

comment:2 Changed 15 years ago by Alec Thomas

The patch looks reasonable, though ideally the permission changes would be behind a configuration option - defaulting to the current behaviour, but configurable for a fast path.

comment:3 Changed 14 years ago by Michael Renzmann

Owner: changed from Alec Thomas to Michael Renzmann

I just committed the first part of your patch (regarding realm) in r7377. Will try to come up with a patch for part two for the permission checks, where a configuration option modifies the behaviour (as athomas suggested above).

comment:4 in reply to:  3 Changed 14 years ago by Michael Renzmann

Replying to otaku42:

I just committed the first part of your patch (regarding realm) in r7377.

Guess I had a bad day:

I accidentally committed stuff to the tagged release v0.6 rather than to trunk, which of course was wrong. I'll thus roll back r7377 and re-apply it to trunk.

comment:5 Changed 14 years ago by Michael Renzmann

(In [7381]) In a query, a realm may be specified to limit results to the corresponding tag provider. So far all providers were polled anyway, which had a negative impact on performance. This is now fixed. Thanks to mixedpuppy for the patch. Refs #4503.

Changed 14 years ago by Michael Renzmann

Attachment: lazy_permission_check.patch added

Introduce new ini option that allows to skip "per ticket" permission checks

comment:6 Changed 14 years ago by Michael Renzmann

Status: newassigned

lazy_permission_check.patch implements skipping per-ticket permissions as suggested by mixedpuppy, combined with an ini option that allows to enable this feature as suggested by athomas. The default is to stick to the old (current) behaviour.

Please review and test the patch. Once one confirms it works as intended, I'll commit it to the repository.

comment:7 Changed 12 years ago by Steffen Hoffmann

Cc: Michael Renzmann added
Owner: changed from Michael Renzmann to Steffen Hoffmann
Status: assignednew

Reading this ticket made me really sad.

There is a claim, but the reporter isn't able to test and provide feedback on an alternative fix within more than 1 1/2 years. No facts have been provided for these issues as well: "... fine if you have a few hundred (or thousand) tickets. In a database with a couple hundred thousand tickets, the tags plugin becomes unusable ..." - just rhetoric.

What are i.e. numbers and corresponding process/page load times? And why on earth nobody has been screaming for a pager? TracSearch and TracQuery have one. So TagsQuery will profit from getting one as well.

Again, step of with facts, results, please and don't discredit your report further by silence. In absence of such information I prepared a test environment with 30.000 test tickets. Man, this is dragging response time down, but only for formatting of results it seems:

  • 6766 of 30.000 tickets tagged with 'Test'
  • /tags?q=Test
  • query_results from db returned in 1,7 * 10-6 s
  • page with table view and 6767 rows rendered in ~ 50 s

Still I count on real application testing for upcoming features. Do you follow? Anyone?

comment:8 Changed 12 years ago by Steffen Hoffmann

Status: newassigned

Permission checks are not an issue.

At least there is no measurable difference in my test environment, while the query logic might introduce remarkable slowdowns. And the Rendering does for sure, so a pager is a must-have to survive with several hundred or thousands of similarly tagged resources. Testing this right now.

The lazy permission check might get dropped in turn, if no one can step up with test results in favor of it. By now I'm unable to confirm it's effect with my test data. But I have no fine-grained per-resource restrictions applied here either, so this might mean nothing, perhaps.

comment:9 Changed 12 years ago by Steffen Hoffmann

(In [10737]) TagsPlugin: Wrap ListTagged results in Trac pager, refs #4503 and #9064.

This fixes SystemError: ../Objects/tupleobject.c ... occasionally seen with large sets of results. And response time is getting reasonable even for 30.000+ matching tagged resources.

Beware: Hold on, it's a feature-incomplete changeset - a step in transition.

comment:10 Changed 12 years ago by Steffen Hoffmann

(In [10738]) TagsPlugin: Move Genshi builder instructions to fragment template, refs #4503 and #9064.

Now all list formats and even the warning message about obsolete TagsQuery arguments are integrated into a HTML fragment template. Various goodies like differently shaded odd/even rows and 'no results message' have been added. Oddly squeezed ListTagged tables are gone as well - all due to re-using big parts of the TicketQuery code from Trac core.

comment:11 Changed 12 years ago by Ryan J Ollos

Cc: Ryan J Ollos added

comment:12 Changed 12 years ago by Steffen Hoffmann

(In [10781]) TagsPlugin: Add mincount functionality to main tags page, refs #4078 and #4503.

cloud_mincount allows to cut default cloud size and reduce cluttering and render time for the /tags page a bit when tagging is in heavy use. Calls of wiki macro TagCloud are not affected by this configuration option.

comment:13 Changed 11 years ago by Steffen Hoffmann

(In [13165]) TagsPlugin: Mirror ticket tags into normalized tags db table, refs #4503, #9194 and #11096.

Parsing ticket fields per request has been a known performance hog, that's why using the tags db table for regular access yields a considerable speed-up.

As a side-effect 'ticket' is the pilot for fixing a recently announced issue with get_tagged_resources in the default tag provider implementation. Nicely polished by stretching unit test coverage to this part of the code.

ToDo: DefaultTagProvider must be fixed as well, and other tag providers should be reviewed too as a precaution.

comment:14 Changed 11 years ago by Steffen Hoffmann

(In [13166]) TagsPlugin: Cache get_tagged_resources for tickets, refs #4503.

Current implementation is for Trac >= 0.12, but likely a wontfix for 0.11, if no code is contributed for getting it there, sorry.

comment:15 Changed 11 years ago by Steffen Hoffmann

(In [13168]) TagsPlugin: Tap into the potential of permission check short-cuts, refs #4503.

This has been based on the three years old lazy_permission_check.patch by otaku42, thanks a lot anyway.

comment:16 Changed 11 years ago by Steffen Hoffmann

Cc: osimons added

This ends a development process with some patches drafted almost two years ago.

Not only due to vague user feedback this has been a rather long and demanding task. Learning from osimons about profiling Trac requests and preparing a test setup with sufficiently enough tags and tickets to clearly spot performance issue, these have been the major challenges. Finally the following attachments have the numbers from requests to TagsPlugin's tag cloud (/tags) running in a Python virtual environment with

  • Trac-1.0.1
  • Genshi-0.6

on a Debian 6 machine with (dual core) 2.0 GHz AMD Athlon 64 X2 and 5 GB RAM (an aged, yet well-performing developer system).

Changed 11 years ago by Steffen Hoffmann

details of 4th request to /tags provided by stock TagsPlugin code

Changed 11 years ago by Steffen Hoffmann

details of 4th request to /tags provided by TagsPlugin using ticket tags caching

Changed 11 years ago by Steffen Hoffmann

details of 4th request to /tags provided by TagsPlugin using permission short-cuts (ticket_fast_permcheck = True) in concert with ticket tags caching

comment:17 Changed 11 years ago by Steffen Hoffmann

Profiling summary (except):

  • caching tags of ~ 30.000 ticket (little more than 1 tag/ticket on average) lowers response time a bit despite of increase in primitive calls - expecting more impressive numbers for more tags per ticket
  • skipping fine-grained view permission checks not only halves the number of primitive calls, but significantly cuts response time down from 6.6 to 4.0 seconds
  • constantly high amount of time spent in tractags.query.Query, mostly _match method

so improving, or even obsoleting the query is the logical next step.

Conclusion: Despite of spending much time we did not even touch the top reason for long response times. This observation nicely backs the general depreciation of so-called 'premature optimization' in Python in general.

comment:18 Changed 11 years ago by Steffen Hoffmann

(In [13172]) TagsPlugin: Cache a configuration value for performance, refs #4503.

Impact and effect of this change are amazing, verified by request-profiling.

Changed 11 years ago by Steffen Hoffmann

details of 4th request to /tags provided by TagsPlugin with/without fine-grained permission checks when caching ticket_fast_permcheck

comment:19 Changed 11 years ago by Steffen Hoffmann

(In [13173]) TagsPlugin: Replace configuration option by simple configuration check, refs #4503.

Finding any non-standard/default permission policy, fine-grained permission checking will be enforced, intentionally removing the opt-out for safety. Thanks to Odd Simon Simonsen for bringing-up the idea in the first place.

comment:20 Changed 11 years ago by Steffen Hoffmann

In 13392:

TagsPlugin: Provide faster tag counter implementations by a Counter class, refs #4503.

Handling of tag collections in Counter objects is more efficient,
because we skip creation of associated resources and permission checks.
We use a feature-stripped, more PEP8-conform version of the original recipe.

Other Trac plugins, that ask TagsPlugin for existing tags frequency or need
just all tags, should avoid all methods beside TagSystem.get_all_tags too.

comment:21 Changed 10 years ago by Steffen Hoffmann

In 13427:

TagsPlugin: Push tags (counter) method into tag providers, refs #4503.

This is required for tag providers with tag store outside of default store.

comment:22 Changed 10 years ago by Jun Omae

Unit tests has the following failure with Python 2.4. __missing__ method is available since Python 2.5.

FAIL: Doctest: tractags.api.Counter.__add__
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.4/doctest.py", line 2153, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for tractags.api.Counter.__add__
  File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 161, in __add__

----------------------------------------------------------------------
File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 164, in tractags.api.Counter.__add__
Failed example:
    Counter('abbb') + Counter('bcc')
Exception raised:
    Traceback (most recent call last):
      File "/usr/lib/python2.4/doctest.py", line 1244, in __run
        compileflags, 1) in test.globs
      File "<doctest tractags.api.Counter.__add__[0]>", line 1, in ?
        Counter('abbb') + Counter('bcc')
      File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 172, in __add__
        newcount = self[elem] + other[elem]
    KeyError: 'a'

The following patch uses __getitem__ method which handles KeyError instead of __missing__.

  • tractags/api.py

     
    8282        """
    8383        self.update(iterable, **kwargs)
    8484
    85     def __missing__(self, key):
    86         return 0
     85    def __getitem__(self, key):
     86        try:
     87            return dict.__getitem__(self, key)
     88        except KeyError:
     89            return 0
    8790
    8891    def most_common(self, n=None):
    8992        """List the n most common elements and their counts from the most

comment:23 Changed 10 years ago by Jun Omae

Or passing 0 for default parameter of dict.get.

  • tractags/api.py

    diff --git a/tractags/api.py b/tractags/api.py
    index 7320bcf..0ea3795 100644
    a b class Counter(dict): 
    8282        """
    8383        self.update(iterable, **kwargs)
    8484
    85     def __missing__(self, key):
    86         return 0
    87 
    8885    def most_common(self, n=None):
    8986        """List the n most common elements and their counts from the most
    9087        common to the least.  If n is None, then list all element counts.
    class Counter(dict): 
    168165        if not isinstance(other, Counter):
    169166            return NotImplemented
    170167        result = Counter()
     168        self_get = self.get
     169        other_get = other.get
    171170        for elem in set(self) | set(other):
    172             newcount = self[elem] + other[elem]
     171            newcount = self_get(elem, 0) + other_get(elem, 0)
    173172            if newcount > 0:
    174173                result[elem] = newcount
    175174        return result

comment:24 Changed 10 years ago by Steffen Hoffmann

Avoiding exceptions - the second approach - looks cleaner to me than handling them, so I prefer that.

comment:25 Changed 10 years ago by Steffen Hoffmann

In 13432:

TagsPlugin: Various fixes, refs #4503.

Thanks to Jun Omae for giving hints on the issues and for proposing patches.

comment:26 Changed 10 years ago by Steffen Hoffmann

In 13712:

TagsPlugin: Return all tags for matching tickets, refs #4503, #11226 and #11302.

When moving ticket tags to normalized storage in [13165], my adapted db query
did no longer retrieve all tags for matching tickets in ticket tag provider
method get_tagged_resources().

Using this method in TagSystem.replace_tag() consequently caused
reproducible loss of all ticket tags other than the new tag.
Corresponding unit test covers ticket with multiple tags now as well.

comment:27 Changed 10 years ago by Steffen Hoffmann

Resolution: fixed
Status: assignedclosed

In 13815:

TagsPlugin: Completing preparation for v0.7 release.

Availability of that code as stable, tagged release
closes #2429, #3359, #3610, #3624, #3677, #3754, #3864, #3947, #3983, #4078, #4277, #4503, #4799, #5523, #7787, #7857, #8638, #9057, #9058, #9059, #9060, #9061, #9062, #9063, #9149, #9210, #9521, #9630, #9636, #10032, #10416, #10636, #11096, #11147, #11152, #11274, #11302, #11658 and #11659.

Additionally there are some issues and enhancement requests showing progress,
but known to require more work to resolve them satisfactorily, specifically
refs #2804, #4200, #8747 and #9064.

Thanks to all contributors and followers, that enabled and encouraged a good
portion of this development work.

comment:28 Changed 10 years ago by Steffen Hoffmann

In 13860:

TagsPlugin: Merge code from KeywordSuggestPlugin's current trunk (0.5.0dev), refs #1344, #3816, #4201 and #4503.

Functional overlap of the aforementioned plugin with TagsPlugin is a fact.
TagsPlugin lacking auto-complete-style assistance for 'keyword' alias tag
input fields took a lot from its potential efficiency.

Consequently KeywordSuggestPlugin is getting integrated from now.
Maintaining said plugin separately as another low-profile plugin for users
that dislike TagsPlugin might still happen depending on developer priorities
and user feedback.

PEP8 clean-up and move to TagsPlugin >= 0.7 performance-enhanced API for
querying current tag list has been done on the way.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Steffen Hoffmann.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.