Arrow of time
Arrow of time

Dealing with Trac performance problems

Share Tweet Share

Trac is a great little web tool for both developers and random collaborators - it's a wiki integrated with a Subversion …

Trac is a great little web tool for both developers and random collaborators - it's a wiki integrated with a Subversion browser with simple project management tools and extensible with a lot of different plugins. Unfortunately it's also slow. I've had to deal with an Internet-facing Trac wiki and came up with some solutions which lessen this problem.

While Trac was great when I introduced it as a simple project management aid in a team of half a dozen people, it is seriously underperforming when exposed to the Internet. The initial (default) installation on recent hardware (Xeon 5606) with FastCGI gets up to 8 pages per second per CPU core which is simply ridiculously slow. I don't have the time to hack on Trac but it looks like a major part of this slowness could be that it simply doesn't pre-render HTML when saving wiki pages, instead rendering everything on the fly. But I could be wrong. I doubt my setup has an error as I get the same performance on other servers I have Trac installed (which I didn't bother to benchmark until now), and I've checked that FastCGI is actually working and doesn't reload Python, Trac, it's environment, etc. I'm using Apache as the web server because I'm happy with it, and more importantly, know it inside-out. I think that configurations similar to the ones I'm talking about should be implementable in other web servers.

I dealt with this problem with two simple optimizations available in Apache: output caching and forcing HTTP "Expires" headers.

Output caching

The current state of the art in web server caching / reverse proxying is certainly Varnish, but setting it up (which would require running Apache on a different port and proxying it through Varnish) just seemed an overkill. Instead, I've used Apache's built-in mod_cache, which is extremely rudimentary compared to Varnish but good enough for a quick fix. Mod_cache basically integrates in the Apache's request processing where it captures the request's output (which is sent to the client) and caches it in case the same URL (along with a couple of important headers) is requested again in the near future. It is configured per virtual host and by specifying URL subtrees it should handle.

The cache can be memory-based but I've configured a disk-based cache for pretty much the same reason Varnish does it - I trust the OS's virtual memory management, with which mod_cache plays nicely as it uses sendfile(2) to send the cached content out. It turns out that this works extremely well (at least with UFS, as ZFS doesn't play nicely with sendfile() and mmap()), as I get CPU loads with USER portion less than 1% and the SYS portion of 5-10%, saturating every Ethernet link I can throw at it.

The mod_cache portion of my configuration looks like this:

CacheRoot /srv/pubtracwiki/cache
CacheDirLength 3
CacheDirLevels 2

CacheIgnoreNoLastMod On
CacheIgnoreHeaders Set-Cookie

CacheLock On

CacheEnable disk /wiki
CacheEnable disk /chrome

The CacheRoot directive specifies the file system location where the cache will be created. The cache is sharded into subdirectories whose name contains CacheDirLength letters and which are nested CacheDirLevels deep. I've used CacheIgnoreNoLastMod since Trac doesn't set the Last-Modified HTTP header which would have interfered with caching. The CacheIgnoreHeaders directive is needed because otherwise the cacheed objects would have been varied by the cookie, i.e. each unique cookie value would generate another cached object, which is obviously not what we want here. Obviously, this setting breaks HTTP sessions, but that's ok for this specific installation. The CacheLock directive protects against the "thundering herd" issue when the cache needs to be regenerated by a large number of clients at once. Finally, the most important directive is CacheEnable which configures the type of caching and which URL subtrees (when examined on a per-virtual host basis) should be cached. I've selected /wiki and /chome for caching - the former is the root for all wiki pages and the latter contains CSS, JS and image resources.

The reason I'm ok with breaking HTTP sessions is that this instance of Trac, on regular HTTP port 80 is to be read-only, with all editing going on via authenticated HTTPS. The HTTPS instance of Trac (which shares the database with the Internet-facing one) is a plain install without caching.

Usage of mod_cache has gotten the performance up from 16 page loads per second to more PPS than I need to fully saturate my network - because it effectively transforms Trac into statically served content. The cache is maintained (pruned) by htcacheclean, without which it would grow unbounded.

Caching in general is a tricky subject, and we are lucky to have a guide to mod_cache as a part of Apache's official documentation.

Forcing HTTP "Expires" headers

This doesn't have much to do with the server side of the setup but rather with the client side. The HTTP "Expires" header effectively tells browsers not to bother reloading objects from servers (even if they would have been reported as "304 Not Modified") until a certain timeout has expired. It is a powerful way to optimize web browser's caches.

My Trac configuration looks like this:

ExpiresActive On
ExpiresByType application/javascript A3600
ExpiresByType text/css A3600
ExpiresByType image/png A3600
ExpiresByType image/gif A3600
ExpiresByType image/jpeg A3600

This configuration sets the Expires header for CSS, JS and image objects to 1 hour after they've been accessed (the "A3600" part). It causes all the Trac resources under /chrome to be fetched from the server only once in a typical user's browsing session, immensly reducing client-server round-trips.

So there they are - a couple of simple techniques which made my Trac installations ready for public access. They will also work with most other dynamic Web applications.

#1 Re: Dealing with Trac performance problems

Added on 2012-04-06T15:34 by ...

Perhaps fossil-scm is better for you than trac.

comments powered by Disqus