#4503 closed enhancement (fixed)
[PATCH] performance issues
Reported by: | Shane Caraveo | Owned by: | Steffen Hoffmann |
---|---|---|---|
Priority: | normal | Component: | TagsPlugin |
Severity: | major | Keywords: | performance |
Cc: | andref.dias@…, Michael Renzmann, Ryan J Ollos, osimons | Trac Release: | 0.11 |
Description
The tags plugin is excellent, however, there are some pretty big performance issues.
One is, regardless of specifying a realm (via tags search page or a macro), all tag providers are asked for their tags, and only after that is the realm checked.
The second is checking permissions on each ticket (in the ticket provider). This is fine if you have a few hundred (or thousand) tickets. In a database with a couple hundred thousand tickets, the tags plugin becomes unusable. Relying on a ticket_view and tags_view at a "global" level is sufficient for the majority of cases.
The attached patch addresses both issues above, though I don't regard it as a "final" patch.
Attachments (6)
Change History (34)
Changed 16 years ago by
Attachment: | tagsplugin-perf.patch added |
---|
comment:1 Changed 16 years ago by
Cc: | andref.dias@… added; anonymous removed |
---|---|
Type: | defect → enhancement |
These performance issues are of great interest to me too!
comment:2 Changed 16 years ago by
The patch looks reasonable, though ideally the permission changes would be behind a configuration option - defaulting to the current behaviour, but configurable for a fast path.
comment:3 follow-up: 4 Changed 15 years ago by
Owner: | changed from Alec Thomas to Michael Renzmann |
---|
comment:4 Changed 15 years ago by
comment:5 Changed 15 years ago by
(In [7381]) In a query, a realm may be specified to limit results to the corresponding tag provider. So far all providers were polled anyway, which had a negative impact on performance. This is now fixed. Thanks to mixedpuppy for the patch. Refs #4503.
Changed 15 years ago by
Attachment: | lazy_permission_check.patch added |
---|
Introduce new ini option that allows to skip "per ticket" permission checks
comment:6 Changed 15 years ago by
Status: | new → assigned |
---|
lazy_permission_check.patch
implements skipping per-ticket permissions as suggested by mixedpuppy, combined with an ini option that allows to enable this feature as suggested by athomas. The default is to stick to the old (current) behaviour.
Please review and test the patch. Once one confirms it works as intended, I'll commit it to the repository.
comment:7 Changed 13 years ago by
Cc: | Michael Renzmann added |
---|---|
Owner: | changed from Michael Renzmann to Steffen Hoffmann |
Status: | assigned → new |
Reading this ticket made me really sad.
There is a claim, but the reporter isn't able to test and provide feedback on an alternative fix within more than 1 1/2 years. No facts have been provided for these issues as well: "... fine if you have a few hundred (or thousand) tickets. In a database with a couple hundred thousand tickets, the tags plugin becomes unusable ..." - just rhetoric.
What are i.e. numbers and corresponding process/page load times? And why on earth nobody has been screaming for a pager? TracSearch and TracQuery have one. So TagsQuery will profit from getting one as well.
Again, step of with facts, results, please and don't discredit your report further by silence. In absence of such information I prepared a test environment with 30.000 test tickets. Man, this is dragging response time down, but only for formatting of results it seems:
- 6766 of 30.000 tickets tagged with 'Test'
- /tags?q=Test
- query_results from db returned in 1,7 * 10-6 s
- page with table view and 6767 rows rendered in ~ 50 s
Still I count on real application testing for upcoming features. Do you follow? Anyone?
comment:8 Changed 13 years ago by
Status: | new → assigned |
---|
Permission checks are not an issue.
At least there is no measurable difference in my test environment, while the query logic might introduce remarkable slowdowns. And the Rendering does for sure, so a pager is a must-have to survive with several hundred or thousands of similarly tagged resources. Testing this right now.
The lazy permission check might get dropped in turn, if no one can step up with test results in favor of it. By now I'm unable to confirm it's effect with my test data. But I have no fine-grained per-resource restrictions applied here either, so this might mean nothing, perhaps.
comment:9 Changed 13 years ago by
(In [10737]) TagsPlugin: Wrap ListTagged results in Trac pager, refs #4503 and #9064.
This fixes SystemError: ../Objects/tupleobject.c ...
occasionally seen
with large sets of results. And response time is getting reasonable even
for 30.000+ matching tagged resources.
Beware: Hold on, it's a feature-incomplete changeset - a step in transition.
comment:10 Changed 13 years ago by
(In [10738]) TagsPlugin: Move Genshi builder instructions to fragment template, refs #4503 and #9064.
Now all list formats and even the warning message about obsolete TagsQuery arguments are integrated into a HTML fragment template. Various goodies like differently shaded odd/even rows and 'no results message' have been added. Oddly squeezed ListTagged tables are gone as well - all due to re-using big parts of the TicketQuery code from Trac core.
comment:11 Changed 13 years ago by
Cc: | Ryan J Ollos added |
---|
comment:12 Changed 13 years ago by
(In [10781]) TagsPlugin: Add mincount functionality to main tags page, refs #4078 and #4503.
cloud_mincount
allows to cut default cloud size and reduce cluttering
and render time for the /tags
page a bit when tagging is in heavy use.
Calls of wiki macro TagCloud
are not affected by this configuration option.
comment:13 Changed 12 years ago by
(In [13165]) TagsPlugin: Mirror ticket tags into normalized tags
db table, refs #4503, #9194 and #11096.
Parsing ticket fields per request has been a known performance hog, that's why
using the tags
db table for regular access yields a considerable speed-up.
As a side-effect 'ticket' is the pilot for fixing a recently announced issue
with get_tagged_resources
in the default tag provider implementation.
Nicely polished by stretching unit test coverage to this part of the code.
ToDo: DefaultTagProvider
must be fixed as well, and other tag
providers should be reviewed too as a precaution.
comment:14 Changed 12 years ago by
(In [13166]) TagsPlugin: Cache get_tagged_resources
for tickets, refs #4503.
Current implementation is for Trac >= 0.12, but likely a wontfix for 0.11, if no code is contributed for getting it there, sorry.
comment:15 Changed 12 years ago by
(In [13168]) TagsPlugin: Tap into the potential of permission check short-cuts, refs #4503.
This has been based on the three years old lazy_permission_check.patch
by
otaku42, thanks a lot anyway.
comment:16 Changed 12 years ago by
Cc: | osimons added |
---|
This ends a development process with some patches drafted almost two years ago.
Not only due to vague user feedback this has been a rather long and demanding task. Learning from osimons about profiling Trac requests and preparing a test setup with sufficiently enough tags and tickets to clearly spot performance issue, these have been the major challenges. Finally the following attachments have the numbers from requests to TagsPlugin's tag cloud (/tags
) running in a Python virtual environment with
- Trac-1.0.1
- Genshi-0.6
on a Debian 6 machine with (dual core) 2.0 GHz AMD Athlon 64 X2 and 5 GB RAM (an aged, yet well-performing developer system).
Changed 12 years ago by
Attachment: | 20130519_profiling_tractags_0.7dev-r13134.txt.gz added |
---|
details of 4th request to /tags
provided by stock TagsPlugin code
Changed 12 years ago by
Attachment: | 20130519_profiling_tractags_0.7dev-r13167.txt.gz added |
---|
details of 4th request to /tags
provided by TagsPlugin using ticket tags caching
Changed 12 years ago by
Attachment: | 20130519_profiling_tractags_0.7dev-r13168.txt.gz added |
---|
details of 4th request to /tags
provided by TagsPlugin using permission short-cuts (ticket_fast_permcheck = True
) in concert with ticket tags caching
comment:17 Changed 12 years ago by
Profiling summary (except):
- caching tags of ~ 30.000 ticket (little more than 1 tag/ticket on average) lowers response time a bit despite of increase in primitive calls - expecting more impressive numbers for more tags per ticket
- skipping fine-grained view permission checks not only halves the number of primitive calls, but significantly cuts response time down from 6.6 to 4.0 seconds
- constantly high amount of time spent in
tractags.query.Query
, mostly_match
method
so improving, or even obsoleting the query is the logical next step.
Conclusion: Despite of spending much time we did not even touch the top reason for long response times. This observation nicely backs the general depreciation of so-called 'premature optimization' in Python in general.
comment:18 Changed 12 years ago by
(In [13172]) TagsPlugin: Cache a configuration value for performance, refs #4503.
Impact and effect of this change are amazing, verified by request-profiling.
Changed 12 years ago by
Attachment: | 20130520_profiling_tractags_0.7dev-r13172.txt.gz added |
---|
details of 4th request to /tags
provided by TagsPlugin with/without fine-grained permission checks when caching ticket_fast_permcheck
comment:19 Changed 12 years ago by
(In [13173]) TagsPlugin: Replace configuration option by simple configuration check, refs #4503.
Finding any non-standard/default permission policy, fine-grained permission checking will be enforced, intentionally removing the opt-out for safety. Thanks to Odd Simon Simonsen for bringing-up the idea in the first place.
comment:22 Changed 11 years ago by
Unit tests has the following failure with Python 2.4. __missing__
method is available since Python 2.5.
FAIL: Doctest: tractags.api.Counter.__add__ ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.4/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for tractags.api.Counter.__add__ File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 161, in __add__ ---------------------------------------------------------------------- File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 164, in tractags.api.Counter.__add__ Failed example: Counter('abbb') + Counter('bcc') Exception raised: Traceback (most recent call last): File "/usr/lib/python2.4/doctest.py", line 1244, in __run compileflags, 1) in test.globs File "<doctest tractags.api.Counter.__add__[0]>", line 1, in ? Counter('abbb') + Counter('bcc') File "/home/jun66j5/src/trac/trac-hacks/tagsplugin/trunk/tractags/api.py", line 172, in __add__ newcount = self[elem] + other[elem] KeyError: 'a'
The following patch uses __getitem__
method which handles KeyError
instead of __missing__
.
-
tractags/api.py
82 82 """ 83 83 self.update(iterable, **kwargs) 84 84 85 def __missing__(self, key): 86 return 0 85 def __getitem__(self, key): 86 try: 87 return dict.__getitem__(self, key) 88 except KeyError: 89 return 0 87 90 88 91 def most_common(self, n=None): 89 92 """List the n most common elements and their counts from the most
comment:23 Changed 11 years ago by
Or passing 0 for default parameter of dict.get
.
-
tractags/api.py
diff --git a/tractags/api.py b/tractags/api.py index 7320bcf..0ea3795 100644
a b class Counter(dict): 82 82 """ 83 83 self.update(iterable, **kwargs) 84 84 85 def __missing__(self, key):86 return 087 88 85 def most_common(self, n=None): 89 86 """List the n most common elements and their counts from the most 90 87 common to the least. If n is None, then list all element counts. … … class Counter(dict): 168 165 if not isinstance(other, Counter): 169 166 return NotImplemented 170 167 result = Counter() 168 self_get = self.get 169 other_get = other.get 171 170 for elem in set(self) | set(other): 172 newcount = self [elem] + other[elem]171 newcount = self_get(elem, 0) + other_get(elem, 0) 173 172 if newcount > 0: 174 173 result[elem] = newcount 175 174 return result
comment:24 Changed 11 years ago by
Avoiding exceptions - the second approach - looks cleaner to me than handling them, so I prefer that.
performance patch