I'm preparing for the 1.0 release of Bullet Cache and have squashed the last (known) bug which plagued it, so I'm cautiously optimistic that it deserves the "1.0" label. It's been very fun working on it and though none of this is terribly exiting news, I'd like to share a few things I've encountered while making it...
#1: Fast hardware isn't
When it comes to performance, there is absolutely no point in emphasizing one hardware subsystem while ignoring others. Early in the Bullet Cache development I bought a 2-socket Xeon 5405 system as my personal workstation expecting to be able to explore concurrent programming to the fullest and to achieve blazing speeds from high-tech multicore-foo. That... didn't exactly happen. The system was highly constrained by the presence of the FSB and slow memory access (DDR2 FB-DIMM 666), so anything which involved any kind of memory traffic was kind of disppointing. Unfortunately, that was all I could afford at the time and anyway the state of the art wasn't that much better either. The memory bottleneck only began to be solved with Nehalem CPUs. (Unfortunately, I've tried recent Opterons and frankly, they suck for this kind of context-switch-intensive tasks).
The next problem is network connectivity - even though I've bought a server-class Intel NIC, I had no idea I actually needed a multi-queue NIC to push high-volume TCP traffic, which meant that, again, none of the cool software design in the Bullet Cache was actually visible when running on limited hardware.
#2: Fast software isn't
My development platform of choice is FreeBSD, partly because I'm used to it, but largely because it's really an excellent environment to work with. The compiler is in the base system, the man pages are complete and clearly state what is and isn't a POSIX API, and there's no hunting down development version of libraries. It's just a very comfortable to program in.
Unfortunately, it kind of lags in multicore support in its network stack. The local Unix sockets and the most of the UDP path are fully concurrent, but the TCP path is sadly not. Since I've tried to optimize Bullet Cache for a large number of small requests by a large number of clients, this translates into a lot of concurrent accesses across the TCP stack, which simply isn't locked fine enough to support it even with a multi-queue NIC. This is the reason I'm posting most of the benchmarks over Unix sockets. Linux is only a bit better.
#3: Even with significant C-foo, malloc bugs will bite you
I consider myself a C veteran, but it still surprises me how many bugs are introduced simply from memory management. I'm not talking about the mundane "oh, I forgot to initialize this pointer so I'm segfaulting on NULL" or "I have an int and I really want to write to the 10-th element" things which feature in many C students' nightmares but the more subtle kind, like "you know that complex calculation which decides the array index? well congratulations, it has an integer overflow and now you're accessing NULL - 1" and "you know this piece of code which initializes a static variable and was never intended to be called by multiple threads at the same time? well now it is and you have both a memory leak from multiple malloc()s overwriting each other AND multiple bogus free()s over the same variable, corrupting malloc()'s structures."
And yet, there still often isn't a better tool for the job than C.
#3a: Thou shall use valgrind
I want to construct a monument to the authors of Valgrind. It should be distributed with gcc. That is all.
#4: Keep calm and carry on
Bugs will happen. Systems will have unexpected bottlenecks. Prices will rise, politicians will philander, you too will get old; and when you do, you'll fantasize that when you were young, prices were reasonable, politicians were noble, children respected their elders, and systems didn't suck back in the days of FreeBSD 4.0. Bollocks. What is important is to get some quality sleep at night - everything else can wait.