Modify

Opened 6 years ago

Closed 5 months ago

Last modified 5 months ago

#4503 closed enhancement (fixed)

[PATCH] performance issues

Reported by: mixedpuppy Owned by: hasienda
Priority: normal Component: TagsPlugin
Severity: major Keywords: performance
Cc: andref.dias@…, otaku42, rjollos, osimons Trac Release: 0.11

Description

The tags plugin is excellent, however, there are some pretty big performance issues.

One is, regardless of specifying a realm (via tags search page or a macro), all tag providers are asked for their tags, and only after that is the realm checked.

The second is checking permissions on each ticket (in the ticket provider). This is fine if you have a few hundred (or thousand) tickets. In a database with a couple hundred thousand tickets, the tags plugin becomes unusable. Relying on a ticket_view and tags_view at a "global" level is sufficient for the majority of cases.

The attached patch addresses both issues above, though I don't regard it as a "final" patch.

Attachments (6)

tagsplugin-perf.patch (2.3 KB) - added by mixedpuppy 6 years ago.
performance patch
lazy_permission_check.patch (1.5 KB) - added by otaku42 5 years ago.
Introduce new ini option that allows to skip "per ticket" permission checks
20130519_profiling_tractags_0.7dev-r13134.txt.gz (7.3 KB) - added by hasienda 16 months ago.
details of 4th request to /tags provided by stock TagsPlugin code
20130519_profiling_tractags_0.7dev-r13167.txt.gz (7.4 KB) - added by hasienda 16 months ago.
details of 4th request to /tags provided by TagsPlugin using ticket tags caching
20130519_profiling_tractags_0.7dev-r13168.txt.gz (7.3 KB) - added by hasienda 16 months ago.
details of 4th request to /tags provided by TagsPlugin using permission short-cuts (ticket_fast_permcheck = True) in concert with ticket tags caching
20130520_profiling_tractags_0.7dev-r13172.txt.gz (14.2 KB) - added by hasienda 16 months ago.
details of 4th request to /tags provided by TagsPlugin with/without fine-grained permission checks when caching ticket_fast_permcheck

Download all attachments as: .zip

Change History (34)

Changed 6 years ago by mixedpuppy

performance patch

comment:1 Changed 6 years ago by andref.dias@…

  • Cc andref.dias@… added
  • Type changed from defect to enhancement

These performance issues are of great interest to me too!

comment:2 Changed 6 years ago by athomas

The patch looks reasonable, though ideally the permission changes would be behind a configuration option - defaulting to the current behaviour, but configurable for a fast path.

comment:3 follow-up: Changed 5 years ago by otaku42

  • Owner changed from athomas to otaku42

I just committed the first part of your patch (regarding realm) in r7377. Will try to come up with a patch for part two for the permission checks, where a configuration option modifies the behaviour (as athomas suggested above).

comment:4 in reply to: ↑ 3 Changed 5 years ago by otaku42

Replying to otaku42:

I just committed the first part of your patch (regarding realm) in r7377.

Guess I had a bad day:

I accidentally committed stuff to the tagged release v0.6 rather than to trunk, which of course was wrong. I'll thus roll back r7377 and re-apply it to trunk.

comment:5 Changed 5 years ago by otaku42

(In [7381]) In a query, a realm may be specified to limit results to the corresponding
tag provider. So far all providers were polled anyway, which had a negative
impact on performance. This is now fixed. Thanks to mixedpuppy for the
patch. Refs #4503.

Changed 5 years ago by otaku42

Introduce new ini option that allows to skip "per ticket" permission checks

comment:6 Changed 5 years ago by otaku42

  • Status changed from new to assigned

lazy_permission_check.patch implements skipping per-ticket permissions as suggested by mixedpuppy, combined with an ini option that allows to enable this feature as suggested by athomas. The default is to stick to the old (current) behaviour.

Please review and test the patch. Once one confirms it works as intended, I'll commit it to the repository.

comment:7 Changed 3 years ago by hasienda

  • Cc otaku42 added
  • Owner changed from otaku42 to hasienda
  • Status changed from assigned to new

Reading this ticket made me really sad.

There is a claim, but the reporter isn't able to test and provide feedback on an alternative fix within more than 1 1/2 years. No facts have been provided for these issues as well: "... fine if you have a few hundred (or thousand) tickets. In a database with a couple hundred thousand tickets, the tags plugin becomes unusable ..." - just rhetoric.

What are i.e. numbers and corresponding process/page load times? And why on earth nobody has been screaming for a pager? TracSearch and TracQuery have one. So TagsQuery will profit from getting one as well.

Again, step of with facts, results, please and don't discredit your report further by silence. In absence of such information I prepared a test environment with 30.000 test tickets. Man, this is dragging response time down, but only for formatting of results it seems:

  • 6766 of 30.000 tickets tagged with 'Test'
  • /tags?q=Test
  • query_results from db returned in 1,7 * 10-6 s
  • page with table view and 6767 rows rendered in ~ 50 s

Still I count on real application testing for upcoming features. Do you follow? Anyone?

comment:8 Changed 3 years ago by hasienda

  • Status changed from new to assigned

Permission checks are not an issue.

At least there is no measurable difference in my test environment, while the query logic might introduce remarkable slowdowns. And the Rendering does for sure, so a pager is a must-have to survive with several hundred or thousands of similarly tagged resources. Testing this right now.

The lazy permission check might get dropped in turn, if no one can step up with test results in favor of it. By now I'm unable to confirm it's effect with my test data. But I have no fine-grained per-resource restrictions applied here either, so this might mean nothing, perhaps.

comment:9 Changed 3 years ago by hasienda

(In [10737]) TagsPlugin: Wrap ListTagged results in Trac pager, refs #4503 and #9064.

This fixes SystemError: ../Objects/tupleobject.c ... occasionally seen
with large sets of results. And response time is getting reasonable even
for 30.000+ matching tagged resources.

Beware: Hold on, it's a feature-incomplete changeset - a step in transition.

comment:10 Changed 3 years ago by hasienda

(In [10738]) TagsPlugin: Move Genshi builder instructions to fragment template, refs #4503 and #9064.

Now all list formats and even the warning message about obsolete TagsQuery
arguments are integrated into a HTML fragment template. Various goodies like
differently shaded odd/even rows and 'no results message' have been added.
Oddly squeezed ListTagged tables are gone as well - all due to re-using big
parts of the TicketQuery code from Trac core.

comment:11 Changed 3 years ago by rjollos

  • Cc rjollos added

comment:12 Changed 3 years ago by hasienda

(In [10781]) TagsPlugin: Add mincount functionality to main tags page, refs #4078 and #4503.

cloud_mincount allows to cut default cloud size and reduce cluttering
and render time for the /tags page a bit when tagging is in heavy use.
Calls of wiki macro TagCloud are not affected by this configuration option.

comment:13 Changed 16 months ago by hasienda

(In [13165]) TagsPlugin: Mirror ticket tags into normalized tags db table, refs #4503, #9194 and #11096.

Parsing ticket fields per request has been a known performance hog, that's why
using the tags db table for regular access yields a considerable speed-up.

As a side-effect 'ticket' is the pilot for fixing a recently announced issue
with get_tagged_resources in the default tag provider implementation.
Nicely polished by stretching unit test coverage to this part of the code.

ToDo: DefaultTagProvider must be fixed as well, and other tag
providers should be reviewed too as a precaution.

comment:14 Changed 16 months ago by hasienda

(In [13166]) TagsPlugin: Cache get_tagged_resources for tickets, refs #4503.

Current implementation is for Trac >= 0.12, but likely a wontfix for 0.11,
if no code is contributed for getting it there, sorry.

comment:15 Changed 16 months ago by hasienda

(In [13168]) TagsPlugin: Tap into the potential of permission check short-cuts, refs #4503.

This has been based on the three years old lazy_permission_check.patch by
otaku42, thanks a lot anyway.

comment:16 Changed 16 months ago by hasienda

  • Cc osimons added

This ends a development process with some patches drafted almost two years ago.

Not only due to vague user feedback this has been a rather long and demanding task. Learning from osimons about profiling Trac requests and preparing a test setup with sufficiently enough tags and tickets to clearly spot performance issue, these have been the major challenges. Finally the following attachments have the numbers from requests to TagsPlugin's tag cloud (/tags) running in a Python virtual environment with

  • Trac-1.0.1
  • Genshi-0.6

on a Debian 6 machine with (dual core) 2.0 GHz AMD Athlon 64 X2 and 5 GB RAM (an aged, yet well-performing developer system).

Changed 16 months ago by hasienda

details of 4th request to /tags provided by stock TagsPlugin code

Changed 16 months ago by hasienda

details of 4th request to /tags provided by TagsPlugin using ticket tags caching

Changed 16 months ago by hasienda

details of 4th request to /tags provided by TagsPlugin using permission short-cuts (ticket_fast_permcheck = True) in concert with ticket tags caching

comment:17 Changed 16 months ago by hasienda

Profiling summary (except):

  • caching tags of ~ 30.000 ticket (little more than 1 tag/ticket on average) lowers response time a bit despite of increase in primitive calls - expecting more impressive numbers for more tags per ticket
  • skipping fine-grained view permission checks not only halves the number of primitive calls, but significantly cuts response time down from 6.6 to 4.0 seconds
  • constantly high amount of time spent in tractags.query.Query, mostly _match method

so improving, or even obsoleting the query is the logical next step.

Conclusion: Despite of spending much time we did not even touch the top reason for long response times. This observation nicely backs the general depreciation of so-called 'premature optimization' in Python in general.

comment:18 Changed 16 months ago by hasienda

(In [13172]) TagsPlugin: Cache a configuration value for performance, refs #4503.

Impact and effect of this change are amazing, verified by request-profiling.

Changed 16 months ago by hasienda

details of 4th request to /tags provided by TagsPlugin with/without fine-grained permission checks when caching ticket_fast_permcheck

comment:19 Changed 16 months ago by hasienda

(In [13173]) TagsPlugin: Replace configuration option by simple configuration check, refs #4503.

Finding any non-standard/default permission policy, fine-grained permission
checking will be enforced, intentionally removing the opt-out for safety.
Thanks to Odd Simon Simonsen for bringing-up the idea in the first place.

comment:20 Changed 12 months ago by hasienda

In 13392:

TagsPlugin: Provide faster tag counter implementations by a Counter class, refs #4503.

Handling of tag collections in Counter objects is more efficient,
because we skip creation of associated resources and permission checks.
We use a feature-stripped, more PEP8-conform version of the original recipe.

Other Trac plugins, that ask TagsPlugin for existing tags frequency or need
just all tags, should avoid all methods beside TagSystem.get_all_tags too.

comment:21 Changed 11 months ago by hasienda

In 13427:

TagsPlugin: Push tags (counter) method into tag providers, refs #4503.

This is required for tag providers with tag store outside of default store.

comment:22 Changed 11 months ago by jun66j5

Unit tests has the following failure with Python 2.4. __missing__ method is available since Python 2.5.

FAIL: Doctest: tractags.api.Counter.__add__
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.4/doctest.py", line 2153, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for tractags.api.Counter.__add__
  File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 161, in __add__

----------------------------------------------------------------------
File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 164, in tractags.api.Counter.__add__
Failed example:
    Counter('abbb') + Counter('bcc')
Exception raised:
    Traceback (most recent call last):
      File "/usr/lib/python2.4/doctest.py", line 1244, in __run
        compileflags, 1) in test.globs
      File "<doctest tractags.api.Counter.__add__[0]>", line 1, in ?
        Counter('abbb') + Counter('bcc')
      File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 172, in __add__
        newcount = self[elem] + other[elem]
    KeyError: 'a'

The following patch uses __getitem__ method which handles KeyError instead of __missing__.

  • tractags/api.py

     
    8282        """ 
    8383        self.update(iterable, **kwargs) 
    8484 
    85     def __missing__(self, key): 
    86         return 0 
     85    def __getitem__(self, key): 
     86        try: 
     87            return dict.__getitem__(self, key) 
     88        except KeyError: 
     89            return 0 
    8790 
    8891    def most_common(self, n=None): 
    8992        """List the n most common elements and their counts from the most 

comment:23 Changed 11 months ago by jun66j5

Or passing 0 for default parameter of dict.get.

  • tractags/api.py

    diff --git a/tractags/api.py b/tractags/api.py
    index 7320bcf..0ea3795 100644
    a b class Counter(dict): 
    8282        """ 
    8383        self.update(iterable, **kwargs) 
    8484 
    85     def __missing__(self, key): 
    86         return 0 
    87  
    8885    def most_common(self, n=None): 
    8986        """List the n most common elements and their counts from the most 
    9087        common to the least.  If n is None, then list all element counts. 
    class Counter(dict): 
    168165        if not isinstance(other, Counter): 
    169166            return NotImplemented 
    170167        result = Counter() 
     168        self_get = self.get 
     169        other_get = other.get 
    171170        for elem in set(self) | set(other): 
    172             newcount = self[elem] + other[elem] 
     171            newcount = self_get(elem, 0) + other_get(elem, 0) 
    173172            if newcount > 0: 
    174173                result[elem] = newcount 
    175174        return result 

comment:24 Changed 11 months ago by hasienda

Avoiding exceptions - the second approach - looks cleaner to me than handling them, so I prefer that.

comment:25 Changed 11 months ago by hasienda

In 13432:

TagsPlugin: Various fixes, refs #4503.

Thanks to Jun Omae for giving hints on the issues and for proposing patches.

comment:26 Changed 7 months ago by hasienda

In 13712:

TagsPlugin: Return all tags for matching tickets, refs #4503, #11226 and #11302.

When moving ticket tags to normalized storage in [13165], my adapted db query
did no longer retrieve all tags for matching tickets in ticket tag provider
method get_tagged_resources().

Using this method in TagSystem.replace_tag() consequently caused
reproducible loss of all ticket tags other than the new tag.
Corresponding unit test covers ticket with multiple tags now as well.

comment:27 Changed 5 months ago by hasienda

  • Resolution set to fixed
  • Status changed from assigned to closed

In 13815:

TagsPlugin: Completing preparation for v0.7 release.

Availability of that code as stable, tagged release
closes #2429, #3359, #3610, #3624, #3677, #3754, #3864, #3947, #3983, #4078, #4277, #4503, #4799, #5523, #7787, #7857, #8638, #9057, #9058, #9059, #9060, #9061, #9062, #9063, #9149, #9210, #9521, #9630, #9636, #10032, #10416, #10636, #11096, #11147, #11152, #11274, #11302, #11658 and #11659.

Additionally there are some issues and enhancement requests showing progress,
but known to require more work to resolve them satisfactorily, specifically
refs #2804, #4200, #8747 and #9064.

Thanks to all contributors and followers, that enabled and encouraged a good
portion of this development work.

comment:28 Changed 5 months ago by hasienda

In 13860:

TagsPlugin: Merge code from KeywordSuggestPlugin's current trunk (0.5.0dev), refs #1344, #3816, #4201 and #4503.

Functional overlap of the aforementioned plugin with TagsPlugin is a fact.
TagsPlugin lacking auto-complete-style assistance for 'keyword' alias tag
input fields took a lot from its potential efficiency.

Consequently KeywordSuggestPlugin is getting integrated from now.
Maintaining said plugin separately as another low-profile plugin for users
that dislike TagsPlugin might still happen depending on developer priorities
and user feedback.

PEP8 clean-up and move to TagsPlugin >= 0.7 performance-enhanced API for
querying current tag list has been done on the way.

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.