Once upon a time there was a very famous computer scientist named Donald Knuth. He heard a quote from Tony Hoare and liked it so much that everyone now attributes it to him: “Premature optimization is the root of all evil.” Many people have run up against and been frustrated by misinterpretation or overapplication of this quote. Some of them have written cogent analyses of what is wrong with the quote (such as this one, which also contains a nice selection of links to other articles on the topic).
Now, this quote is wrong for many reasons. It is dangerous for two reasons:
- People want to believe it, because performance optimization is not easy and it can only really be learned through experience. Or, as Charles Cook says in his post:
A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems.
The inexperienced developer is much happier if he can point to a quote to justify his beliefs, instead of knowing that he has a lot of time and work ahead before he’ll actually understand the nuances of performance.
- There is some truth to it. Premature optimization is a bad idea. It’s just that knowing when performance optimization is not premature is at least as hard as doing performance optimization, and is even more something that can only be learned through experience.
So, I hereby offer up several recent examples of performance optimization I’ve done that may be useful for helping developers tune their own optimization skills and get a sense for when optimization is not premature. Like all good fables, this post contains a full range of examples: too late, too early, and just right.
The Papa Bear example is a set of revenue reports for an internal web application.
Revenue could be classified by year, business period cycle, sales region, user affiliation, or type. The executive dashboard provides for a wide array of reports that can select by or aggregate over any of these, plus aggregate by month/week/day.
Tip #1: “aggregated” “reports” is the always a red flag that suggests that the need for performance optimization is just around the corner.
Revenue is contained in an orders table which is 50K rows. User affiliation is obtained through a join to the user table (100K rows) and the affiliate table (50k rows). The sales_region table is around 1000 rows. A given report often returns 1000 rows, one for each sales region.
Tip #2: Joining a table over 100K rows to a table over 50K rows is rarely something you want to do in real time.
The reports ran OK, if slowly, when first deployed without optimization. But by the time I got to this reporting system, some of the reports (like the “All regions” dashboard) were taking so long to run that Apache was timing them out; no one had seen them for several days.
Obviously, performance optimization was considered rather too late.
Fix: I created an aggregation table and pre-joined the revenue to its attribute fields in a classic star schema fact table design. Total database usage for the “All regions” dashboard went from 50+ seconds to under 3 seconds (there was still more to optimize in the revenue projections, but that was deemed not worth the effort to gain an additional few seconds).
The Mama Bear example owes full credit to me: I optimized this one way too early.
My company had its own custom phone call routing software, because it sold toll-free numbers to many different clients. An earlier version of the software had accessed the database every time a call came in to determine which actual number to connect the toll-free call with; that software was prone to long delays in connecting calls as it tried to access the database (on a different server), and even failures if the database wasn’t accessible. So when I rewrote the software I cached all the toll-free->local mappings in an in-memory hashtable, which both made lookup incredibly fast and allowed a routing server to remain up even if the database went down. So far, so good.
Then we added geo-targeted routing, in which a call to a toll-free number could be routed to one of several (or several hundred) numbers depending on the nominal region of the caller’s phone number, with a single fallback number to default to if the location didn’t have a specific geo-target. Obviously, I didn’t want to risk the same performance issues we’d had with the direct routing. So I cached all of the geo-targeted routings in memory as well.
Tip #3: When caching in memory to improve performance, make sure to take into consideration the size of what you are caching.
Tip #4: The problem you solved before might not be the problem you are solving now. Did all of the geotargets need to work flawlessly, or just the default number?
The new geo-targeted routing lookup certainly was fast. Except… it also had an enormous memory footprint. While less than 5% or so of our numbers had georouting, they took up 99% of the memory of the app. We very quickly ran up against the limitations of Java heap size on Linux and had to start doing the hacks that allowed for larger heaps. It also became a much more difficult problem to keep this much larger, double-hashed cache in sync with the actual routing as specified in the database. When I left, they were just looking into switching back to database lookup for geo-targeted routing, keeping only the default number in cache. Which, given the small number of customers using this service, would have been the right answer all along.
I’m quite proud of Baby Bear — it almost qualifies as “just-in-time” performance optimization.
We had a web site on which customers could search user profiles. They could search by a wide variety of things, including location, name, education, job history, specializations, and paragraph-long answers to four questions. Except for location, all of the searching was full-text: we took the text they entered into the search box and searched all of the fields for matches.
Tip #5: Full-text search can get slow when done directly in SQL.
We knew that, but we were in a hurry to launch. So we implemented the search directly in SQL. And at launch it was fine. Two months later, when we had 250 profiles and started our first marketing push, it was fine. At the end of the year, we had 1000 profiles, and it was starting to get a little slow, taking maybe 10 seconds to come back from all but location searches.
Tip #6: Always monitor performance and pay attention to anything that is slowing noticeably.
I paid attention. I assigned myself “Worry about profile search performance” as a to-do in Basecamp. Then I talked with one of the developers and had him change the search to use sphinx so that we could do our full-text search using a full-text search engine. Search time dropped from 10 seconds to well under a second. No one noticed. (It seems that 10 seconds was not quite long enough to be irritating.)
Six months later, we had an even bigger marketing push and got up to 7000 profiles. Given our previous benchmarks, search response time would have gone up to at least 70 seconds if we hadn’t made the change, which definitely would have been long enough to be irritating. But because we had switched to sphinx, search times… remained under a second.
Tip #7: The highest form of success in performance optimization is having no one notice that you did it, even as usage scales drastically.
In fact, Tip #7 may be the morale of this story. The first example wasn’t successful performance optimization because everyone noticed the dashboard timing out. The second example wasn’t successful because people noticed the excessive memory usage. The third example was successful because no one noticed anything at all.
And that’s the end of our story. All the programs perform better, all the readers are a little bit better at performance optimization, and everyone lived happily ever after….